Skip to content

Refactor rules creation and magic alerting  #135

@fourstepper

Description

@fourstepper

we've done a ton of work on magic alerting and alerting in general in #127

The issue is that we aren't really getting what we want.

To understand the architecture issue a bit, check out https://github.com/oskoperator/osko/blob/main/internal/helpers/prometheus_helper.go#L421-L461

Such indexing is highly problematic and leads to unexpected behavior.

Ideally, we would like to combine:

Rule types:

"targetRule":        {},
"totalRule":         {},
"goodRule":          {},
"badRule":           {},
"sliMeasurement":    {},
"errorBudgetValue":  {},
"errorBudgetTarget": {},
"burnRate":          {},

Windows:

windows := []string{baseWindow, extendedWindow, "5m", "30m", "1h", "2h", "6h", "24h", "3d"}

and Alerting Burn Rates:

AlertingBurnRates: AlertingBurnRates{
	PageShortWindow:   GetEnvAsFloat64("ABR_PAGE_SHORT_WINDOW", 14.4),
	PageLongWindow:    GetEnvAsFloat64("ABR_PAGE_LONG_WINDOW", 6),
	TicketShortWindow: GetEnvAsFloat64("ABR_TICKET_SHORT_WINDOW", 3),
	TicketLongWindow:  GetEnvAsFloat64("ABR_TICKET_LONG_WINDOW", 1),
},

into a single data structure, that we will simply fill out and pass around as needed

... all of these are coming from the SRE book

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions