Recovery Principles, Breakglass & Emergency Access
Infrastructure Reliability | Technical Operations Excellence
| Principle | Application |
|---|---|
| Go fast, guarded | Speed with policy guardrails |
| Minimize time deps | Don't wait for wall-clock |
| Know intended state | Encode complete configuration |
| Test restores | Untested backups = no backups |
Original + 2 backups minimum
Different storage technologies
Geographic separation
| Tier | Systems | RTO | RPO |
|---|---|---|---|
| 0 | Critical APIs | <15m | 0 |
| 1 | Core services | <4h | <1h |
| 2 | Internal tools | <24h | <4h |
| 3 | Dev/test | <72h | <24h |
| Mechanism | Purpose |
|---|---|
| Breakglass | Override normal access controls |
| MPA | Multi-party authorization |
| Offline creds | Independent of primary systems |
| Temp access | Time-bounded elevation |
Document business justification for all elevated access
| Exercise | Frequency |
|---|---|
| Tabletop | Monthly |
| Failover drill | Quarterly |
| Full DR test | Annually |
| Chaos experiments | Continuous |
Plan to Fail
The best recovery is the one you've practiced.