First-Principles Capacity Analysis - Nestor G Pestelos Jr (ngpestelos)

# First-Principles Capacity Analysis **Parent Topic**: [[Software/README]] A generic scaffold for *any* right-sizing / capacity question (EC2, RDS, ASG, queues, thread pools). Derives the analysis up from a handful of irreducibles so the usual rules of thumb ("target 70%", "use p95", "prefer scale-out") become *consequences* you can re-derive, not a checklist to memorize. The burstable-fleet playbook is one instantiation — see [[EC2 Capacity Planning for Burstable Fleets]]. ## The irreducibles (the physics under all capacity work) - **Work has a rate.** Demand arrives at rate **λ**; the system serves at rate **μ**. Both are per-resource, per-unit-time. - **The master variable is the ratio, not either number:** **utilization ρ = λ/μ**. Dimensionless. Everything reduces to controlling ρ. - **Stability is a hard wall: ρ < 1.** If λ ≥ μ the queue grows without bound → saturation. Not gradual — a cliff. - **The cliff is non-linear.** Waiting time ∝ 1/(1−ρ). Fine to ~0.7, doubles by ~0.85, explodes past 0.9. This is why you never target 100%. - **Demand is a distribution, not a mean.** You fail in the tail, so provision against a percentile (p95/p99) and the burst *shape*, not the average. Two correctives that ride on top: - **Capacity = the first resource to saturate** (CPU, memory, network, disk IO, connections). Analyzing the wrong one yields a confident wrong answer. - **A metric is a proxy.** Valid only if it actually tracks μ-saturation. (This is exactly where burstable `CPUUtilization` lies — it caps at baseline, not at μ.) ## The scaffold — six questions, in order ``` 0. Unit of work What is one "job," and what resource does it burn? (Until this is fixed, λ and μ are undefined.) 1. Binding constraint Which resource hits ρ→1 first? Capacity = min over resources. 2. Demand as a dist. Mean, p95, p99, burst duration/frequency, trend slope. 3. Safe operating point Where is the knee? Target ρ* < 1, lowered by variance + reaction lag. 4. Project the forecast Forecast changes λ (and its shape), not μ. Does ρ' stay < ρ*? 5. Choose the lever Each lever changes μ, μ/$, or commitment. Pick the most reversible that closes the gap. ``` Then a **validation pass**: is the proxy honest, and what's unmeasured? ## Worked derivation **System:** stateless HTTP API, 6 fixed instances behind a load balancer. **Forecast:** a marketing event in 3 weeks, expected **2× mean traffic**. (All numbers illustrative.) **Q0 — unit of work:** one HTTP request. Candidate binding resources: CPU, memory, connection-pool slots. **Q1 — binding constraint** (measure each at p95 over a ≥2-week window): | Resource | p95 now | Reading | |----------|---------|---------| | CPU | 55% | binding (closest to the knee) | | memory | 40% | comfortable | | conn-pool | 30% | comfortable | → **CPU is binding.** Analyze it; *watch* the other two under the forecast. **Q2 — demand distribution:** mean 1000 rps · p95 1800 · p99 2200 · strong diurnal peak · event = ×2 on the mean. Provision against p95–p99, not the 1000 mean. **Q3 — safe operating point:** fixed-performance, moderate variance → target **ρ\* ≤ 0.70** (the 30% buys the non-linear tail *plus* boot-time reaction lag). Now 0.55 < 0.70 → healthy today. **Q4 — project the forecast onto ρ:** ``` naive: ρ' = 0.55 × 2 = 1.10 → over 1.0 → saturation caveat: CPU is often superlinear near the knee → treat 1.10 as a floor, not a point estimate ``` Verdict: **must act** — pull ρ' back to ≤ 0.70. **Q5 — choose the lever** (need μ up by factor `0.55×2 / 0.70 = 1.57`): | Lever | Effect | Commitment / fit | |-------|--------|------------------| | **scale-out** | 6 → 10 instances | reversible; stateless + ASG ✓ ← **choose** | | scale-up | 2× size, 6 nodes | committed; fewer fault domains | | reserve | none (cost only) | WRONG for a temporary spike | | change-family | n/a | CPU is honest here | **Decision:** scale out to ~10 behind the ASG, autoscale on CPU at 60% (headroom for boot lag), scale back after the event. The event is temporary and its size uncertain → the *reversible* lever dominates (a real-options argument, not a preference). **Validation pass:** - Re-check non-binding resources at 2×: memory `40%×2 = 80%` → **crosses 70%**, could become co-binding. Load-test before trusting the CPU-only conclusion. - Proxy honesty: fixed instances → `CPUUtilization` tracks μ truthfully. **If these were T3**, this step fails — μ decays as credits deplete and CPU caps at baseline; switch the binding metric to credit balance. ## What falls out (the payoff) Every rule of thumb is now a consequence, not a memorized step: - **"Target 70%"** = the knee of 1/(1−ρ) plus reaction lag. - **"Use p95, not average"** = you fail in the tail of the demand distribution. - **"Prefer scale-out under uncertainty"** = reversibility dominates when the forecast is uncertain. - **"Find the binding constraint first"** = capacity is the min over resources. - **"Burstable CPUUtilization masks saturation"** = the proxy stops tracking μ once μ becomes time-varying. - **"Don't reserve for a spike"** = RIs change $/μ, not μ; commitment is wrong for temporary demand. ## The burstable special case [[EC2 Capacity Planning for Burstable Fleets]] is this scaffold where **μ is not constant**: it is high while credits last, then collapses to the baseline earn-rate. The same ρ < ρ\* logic holds — you just measure μ's *floor* (credit-depleted baseline), not its burst ceiling, and the honest saturation proxy becomes `CPUCreditBalance`, not `CPUUtilization`. Forecasting then asks whether the projected *average* stays below baseline AND whether any sustained spike outlasts the finite credit balance. --- *Source: derived from queueing first principles (Little's Law `L = λW`; M/M/1 wait ∝ 1/(1−ρ)) and synthesized against [[Throughput Latency Tension]], [[Maintain Capacity Headroom]], [[SQL Scaling Has Four Levers]], and [[EC2 Capacity Planning for Burstable Fleets]].*