Capacity Planning from First Principles - Nestor G Pestelos Jr (ngpestelos)

# Capacity Planning from First Principles **Parent Topic**: [[Software/README]] A narrative explainer that builds capacity measurement up from irreducible truths. The goal: by the time you reach metrics, picking *which metric, which statistic, which threshold* feels inevitable rather than arbitrary. This is the **why**; the analytical scaffold for applying it is [[First-Principles Capacity Analysis]], and the operational **what-to-measure** is the [[Metric Evaluation Runbook]]. ## The arc ``` 1. First principles the irreducible truths (no platform, no metrics yet) 2. What they force why measurement is unavoidable, why the right metric/statistic isn't optional ← the bridge 3. The evaluation process operationalized in the Metric Evaluation Runbook ``` ## 1. First principles **1. Every resource is finite.** A CPU, a disk, a network link, a cache, a credit balance — each absorbs only so much work per unit time. That bound is the *ceiling*. Capacity planning is, at root, knowing where each ceiling sits and staying under it. **2. Work is a rate, not a count.** What matters is not how many requests exist but how fast they *arrive* (λ) versus how fast a resource can *serve* them (μ). Both are rates. A resource that serves 1,000 req/s is fine at 900 arriving and on fire at 1,100 — same size, opposite outcomes. **3. Utilization is the ratio ρ = λ/μ, and stability requires ρ < 1.** When arrivals outrun service, work queues. If ρ ≥ 1 for any sustained period the queue grows without bound — the resource never catches up. This is arithmetic, not a tuning problem. Nothing serves more than μ. **4. The cost of utilization is non-linear.** The trap is reading 80% utilization as "80% of the way to trouble." Queueing delay scales roughly with 1/(1−ρ): going from 70% to 90% does not add 20% of pain, it roughly triples the waiting. There is a *knee* past which latency explodes for tiny load increases. You fail well before 100%. **5. Therefore the usable ceiling sits below the theoretical max.** Because of the knee, the number you plan against is not "where it breaks" but "where it stops being acceptable" — a *red line* set a deliberate margin below the failure point (the structural-engineering safety factor). The headroom is not waste; it absorbs the non-linearity and reaction lag. **6. One resource binds first.** A system's real ceiling is set by whichever component hits its red line first — the *binding constraint*. Everything else has slack, and fixing a non-binding resource buys nothing. The binding resource is usually *not* the obvious one: disk I/O wait, not CPU; credit balance, not CPU; the photos-to-user ratio, not raw user count. **7. The ceiling is empirical, not theoretical.** You cannot read the red line off a spec sheet. Real workloads mix read/write, hot/cold, burst/steady in ways no datasheet predicts. The only honest ceiling comes from measuring the component under *real production load* and watching where the user-facing symptom degrades. **8. Saturation is felt in a user-facing unit.** Users do not experience "90% CPU." They experience a slow page, a timeout, a failed upload. The true ceiling is defined in the unit they feel — latency, time-to-serve, error rate — and a system metric can read healthy while that unit is already breached. **9. Capacity is a moving target.** Load grows, audiences globalize, and the binding constraint itself migrates (speed up the disk and the network becomes the limit). Capacity work is a *loop* — measure, find the ceiling, forecast, act, re-measure — not a one-time calculation. ## 2. What the principles force (the bridge to metrics) Each measurement rule is a *consequence* of a principle above, not a convention to memorize: | Because (principle) | You must (metric consequence) | |---|---| | One resource binds first (6) | **Find the binding metric** — measure the resource that saturates first, not CPU by default | | The ceiling is empirical (7) | **Measure under real load** — derive the red line from production, not the datasheet | | Saturation is user-facing (8) | **Define the ceiling in latency/time-to-serve**, and distrust a healthy-looking system counter | | The cost is non-linear, demand is a distribution (4) | **Pick the statistic by binding type** — percentile for peak-driven, average for accrual-driven; you fail in the tail | | Capacity is a moving target (9) | **Re-confirm the binding metric after any change**, and evaluate on a rolling window | That bridge is the whole argument: the metric choices are forced by the physics. Two people who accept §1 and apply the table above reach the same verdict. ## 3. The evaluation process The operational layer — which metric per resource class, which statistic, coarse→fine sampling, the data-retention constraint, and instance-churn analysis — lives in the [[Metric Evaluation Runbook]]. For the platform-specific instantiation on AWS burstables, see [[EC2 Burstable Capacity from First Principles]] and the playbook [[EC2 Capacity Planning for Burstable Fleets]]. --- *Source: derived from queueing first principles (Little's Law `L = λW`; M/M/1 wait ∝ 1/(1−ρ)) and synthesized against [[First-Principles Capacity Analysis]] and the [[Metric Evaluation Runbook]].*