Incident Excellence

ITIL Lifecycle, Blameless Postmortems & On-Call Sustainability

Incident Management | Technical Operations Excellence

5
ITIL Phases
≤2
Pages/Shift
3Cs
IMAG Framework
48h
Postmortem SLA

ITIL Incident Lifecycle

1. Identify

Detection via monitoring, alerts, or reports

2. Categorize

Classify by type, service, impact area

3. Prioritize

Assign SEV level based on impact + urgency

4. Respond

Diagnose, mitigate, resolve, communicate

5. Close

Verify, document, postmortem, action items

Severity Levels

LevelImpactResponse
SEV1Critical outage<15 min
SEV2Major degradation<30 min
SEV3Minor impact<4 hours
SEV4Low/cosmeticNext business day

IMAG Framework (3Cs)

PrincipleActions
CoordinateIC assigns roles, manages workstreams
CommunicateStatus updates, stakeholder briefs
ControlAuthorize changes, manage scope

Crisis triage: data criticality, trust relationships, compensating controls

Crisis Triage Questions

  • Data criticality: What can be accessed from compromised systems?
  • Trust relationships: What other systems trust the affected one?
  • Compensating controls: Are there mitigations in place?
  • Blast radius: How many users/services affected?

Incident Roles

RoleResponsibility
Incident CommanderOwns resolution, delegates
Ops LeadTechnical investigation
Comms LeadStakeholder updates
ScribeDocuments timeline

SEV1/2: Add Remediation Lead, Legal (if needed)

Blameless Postmortems

Ask "what" and "how" questions, never "why" - it forces justification and blame.

- John Allspaw, Etsy

  • Timeline: What happened, when?
  • Contributing factors: What conditions existed?
  • Action items: Preventative, detective, mitigating

Communication Cadence

SeverityUpdate Frequency
SEV1Every 15 minutes
SEV2Every 30 minutes
SEV3/4Hourly or as needed

Playbooks improve MTTR by 3x on average

Training: Wheel of Misfortune

Role-play exercise for IC practice. Spin wheel to select historic incident, responders handle in real-time simulation.

  • Do: Practice handoffs, escalation
  • Don't: Use for evaluation/blame

Learn from Every Incident

Blameless culture enables honest retrospectives.