# First-Principles Capacity Analysis
**Parent Topic**: [[Software/README]]
A generic scaffold for *any* right-sizing / capacity question (EC2, RDS, ASG, queues, thread pools). Derives the analysis up from a handful of irreducibles so the usual rules of thumb ("target 70%", "use p95", "prefer scale-out") become *consequences* you can re-derive, not a checklist to memorize. The burstable-fleet playbook is one instantiation — see [[EC2 Capacity Planning for Burstable Fleets]].
## The irreducibles (the physics under all capacity work)
- **Work has a rate.** Demand arrives at rate **λ**; the system serves at rate **μ**. Both are per-resource, per-unit-time.
- **The master variable is the ratio, not either number:** **utilization ρ = λ/μ**. Dimensionless. Everything reduces to controlling ρ.
- **Stability is a hard wall: ρ < 1.** If λ ≥ μ the queue grows without bound → saturation. Not gradual — a cliff.
- **The cliff is non-linear.** Waiting time ∝ 1/(1−ρ). Fine to ~0.7, doubles by ~0.85, explodes past 0.9. This is why you never target 100%.
- **Demand is a distribution, not a mean.** You fail in the tail, so provision against a percentile (p95/p99) and the burst *shape*, not the average.
Two correctives that ride on top:
- **Capacity = the first resource to saturate** (CPU, memory, network, disk IO, connections). Analyzing the wrong one yields a confident wrong answer.
- **A metric is a proxy.** Valid only if it actually tracks μ-saturation. (This is exactly where burstable `CPUUtilization` lies — it caps at baseline, not at μ.)
## The scaffold — six questions, in order
```
0. Unit of work What is one "job," and what resource does it burn?
(Until this is fixed, λ and μ are undefined.)
1. Binding constraint Which resource hits ρ→1 first? Capacity = min over resources.
2. Demand as a dist. Mean, p95, p99, burst duration/frequency, trend slope.
3. Safe operating point Where is the knee? Target ρ* < 1, lowered by variance + reaction lag.
4. Project the forecast Forecast changes λ (and its shape), not μ. Does ρ' stay < ρ*?
5. Choose the lever Each lever changes μ, μ/$, or commitment. Pick the most reversible that closes the gap.
```
Then a **validation pass**: is the proxy honest, and what's unmeasured?
## Worked derivation
**System:** stateless HTTP API, 6 fixed instances behind a load balancer. **Forecast:** a marketing event in 3 weeks, expected **2× mean traffic**. (All numbers illustrative.)
**Q0 — unit of work:** one HTTP request. Candidate binding resources: CPU, memory, connection-pool slots.
**Q1 — binding constraint** (measure each at p95 over a ≥2-week window):
| Resource | p95 now | Reading |
|----------|---------|---------|
| CPU | 55% | binding (closest to the knee) |
| memory | 40% | comfortable |
| conn-pool | 30% | comfortable |
→ **CPU is binding.** Analyze it; *watch* the other two under the forecast.
**Q2 — demand distribution:** mean 1000 rps · p95 1800 · p99 2200 · strong diurnal peak · event = ×2 on the mean. Provision against p95–p99, not the 1000 mean.
**Q3 — safe operating point:** fixed-performance, moderate variance → target **ρ\* ≤ 0.70** (the 30% buys the non-linear tail *plus* boot-time reaction lag). Now 0.55 < 0.70 → healthy today.
**Q4 — project the forecast onto ρ:**
```
naive: ρ' = 0.55 × 2 = 1.10 → over 1.0 → saturation
caveat: CPU is often superlinear near the knee → treat 1.10 as a floor, not a point estimate
```
Verdict: **must act** — pull ρ' back to ≤ 0.70.
**Q5 — choose the lever** (need μ up by factor `0.55×2 / 0.70 = 1.57`):
| Lever | Effect | Commitment / fit |
|-------|--------|------------------|
| **scale-out** | 6 → 10 instances | reversible; stateless + ASG ✓ ← **choose** |
| scale-up | 2× size, 6 nodes | committed; fewer fault domains |
| reserve | none (cost only) | WRONG for a temporary spike |
| change-family | n/a | CPU is honest here |
**Decision:** scale out to ~10 behind the ASG, autoscale on CPU at 60% (headroom for boot lag), scale back after the event. The event is temporary and its size uncertain → the *reversible* lever dominates (a real-options argument, not a preference).
**Validation pass:**
- Re-check non-binding resources at 2×: memory `40%×2 = 80%` → **crosses 70%**, could become co-binding. Load-test before trusting the CPU-only conclusion.
- Proxy honesty: fixed instances → `CPUUtilization` tracks μ truthfully. **If these were T3**, this step fails — μ decays as credits deplete and CPU caps at baseline; switch the binding metric to credit balance.
## What falls out (the payoff)
Every rule of thumb is now a consequence, not a memorized step:
- **"Target 70%"** = the knee of 1/(1−ρ) plus reaction lag.
- **"Use p95, not average"** = you fail in the tail of the demand distribution.
- **"Prefer scale-out under uncertainty"** = reversibility dominates when the forecast is uncertain.
- **"Find the binding constraint first"** = capacity is the min over resources.
- **"Burstable CPUUtilization masks saturation"** = the proxy stops tracking μ once μ becomes time-varying.
- **"Don't reserve for a spike"** = RIs change $/μ, not μ; commitment is wrong for temporary demand.
## The burstable special case
[[EC2 Capacity Planning for Burstable Fleets]] is this scaffold where **μ is not constant**: it is high while credits last, then collapses to the baseline earn-rate. The same ρ < ρ\* logic holds — you just measure μ's *floor* (credit-depleted baseline), not its burst ceiling, and the honest saturation proxy becomes `CPUCreditBalance`, not `CPUUtilization`. Forecasting then asks whether the projected *average* stays below baseline AND whether any sustained spike outlasts the finite credit balance.
---
*Source: derived from queueing first principles (Little's Law `L = λW`; M/M/1 wait ∝ 1/(1−ρ)) and synthesized against [[Throughput Latency Tension]], [[Maintain Capacity Headroom]], [[SQL Scaling Has Four Levers]], and [[EC2 Capacity Planning for Burstable Fleets]].*