Non-Abstract Large System Design
Capacity & Release | Technical Operations Excellence
Can we build it at all?
Optimize design choices
Cost, time, resources
Graceful degradation
| Step | Activity |
|---|---|
| 1. Demand Forecast | Historical trends + growth models |
| 2. Supply Analysis | Current capacity, bottlenecks |
| 3. Gap Assessment | Where will we run out? |
| 4. Headroom Planning | N+1 min, N+2 for critical |
| Test Type | Purpose | Target |
|---|---|---|
| Baseline | Normal load | Current traffic |
| Stress | Find limits | 2x expected |
| Spike | Sudden surge | 10x for 30s |
| Soak | Leaks, drift | 24-48 hours |
| Dimension | Trade-off |
|---|---|
| Consistency | vs. Availability (CAP) |
| Latency | vs. Throughput |
| Cost | vs. Resilience |
| Complexity | vs. Maintainability |
| Metric | Target |
|---|---|
| CPU Utilization | <70% avg, <90% peak |
| Memory | <80% avg, <95% peak |
| Disk I/O | <70% queue depth |
| Network | <60% bandwidth |
Leave headroom for traffic spikes and incidents
| Type | When to Use |
|---|---|
| Vertical | Simple, single-instance |
| Horizontal | Stateless, distributed |
| Auto-scaling | Variable traffic patterns |
| Predictive | Known events (launches) |
Plan for 2x
Capacity planning is cheaper than outages.