Domain 3: Alerting Strategy

Actionable Alerts and Runbooks

Observability Bot | Observability | Max 30 Points

0-6
Ad-hoc
7-12
Foundational
13-18
Standardized
19-24
Advanced
25-30
Optimized

Scoring Criteria by Level

LevelCriteria
1Few alerts; mostly noisy; no runbooks; alert fatigue common
2Basic alerts exist; high noise ratio; some documentation
3SLO-based alerts; runbooks linked; regular tuning
4Multi-window burn rates; <5% noise; automated tuning
5Self-healing alerts; ML anomaly detection; proactive

Assessment Questions

#QuestionMax
1What % of alerts are actionable?6
2How are alerts linked to runbooks?6
3How do you tune alert thresholds?6
4Do alerts correlate with SLO burn rates?6
5How do you manage alert escalation?6

Focus Areas

  • Actionability: Every alert should have a clear action
  • SLO-Based: Alert on error budget burn, not thresholds
  • Runbooks: Documented response procedures
  • Tuning: Regular noise reduction reviews

Anti-Patterns (Red Flags)

  • Alerting on causes, not symptoms
  • >20% non-actionable alerts
  • No runbooks or outdated runbooks
  • Alert storms during incidents
  • Alerts ignored due to fatigue

Evidence Checklist

  • Alert actionability metrics tracked
  • Runbooks exist for all critical alerts
  • Alert noise ratio <20%
  • Multi-window burn rate alerts configured
  • Regular alert review cadence

Related Domains

DomainRelationship
SLOsBurn rate alerts derive from SLOs
ObservabilityAlerts query observability data
On-CallAlert quality affects on-call health

Alert on Symptoms

Every page should require human action.