# Capacity Planning from First Principles
**Parent Topic**: [[Software/README]]
A narrative explainer that builds capacity measurement up from irreducible truths. The goal: by the time you reach metrics, picking *which metric, which statistic, which threshold* feels inevitable rather than arbitrary. This is the **why**; the analytical scaffold for applying it is [[First-Principles Capacity Analysis]], and the operational **what-to-measure** is the [[Metric Evaluation Runbook]].
## The arc
```
1. First principles the irreducible truths (no platform, no metrics yet)
2. What they force why measurement is unavoidable, why the right
metric/statistic isn't optional ← the bridge
3. The evaluation process operationalized in the Metric Evaluation Runbook
```
## 1. First principles
**1. Every resource is finite.** A CPU, a disk, a network link, a cache, a credit balance — each absorbs only so much work per unit time. That bound is the *ceiling*. Capacity planning is, at root, knowing where each ceiling sits and staying under it.
**2. Work is a rate, not a count.** What matters is not how many requests exist but how fast they *arrive* (λ) versus how fast a resource can *serve* them (μ). Both are rates. A resource that serves 1,000 req/s is fine at 900 arriving and on fire at 1,100 — same size, opposite outcomes.
**3. Utilization is the ratio ρ = λ/μ, and stability requires ρ < 1.** When arrivals outrun service, work queues. If ρ ≥ 1 for any sustained period the queue grows without bound — the resource never catches up. This is arithmetic, not a tuning problem. Nothing serves more than μ.
**4. The cost of utilization is non-linear.** The trap is reading 80% utilization as "80% of the way to trouble." Queueing delay scales roughly with 1/(1−ρ): going from 70% to 90% does not add 20% of pain, it roughly triples the waiting. There is a *knee* past which latency explodes for tiny load increases. You fail well before 100%.
**5. Therefore the usable ceiling sits below the theoretical max.** Because of the knee, the number you plan against is not "where it breaks" but "where it stops being acceptable" — a *red line* set a deliberate margin below the failure point (the structural-engineering safety factor). The headroom is not waste; it absorbs the non-linearity and reaction lag.
**6. One resource binds first.** A system's real ceiling is set by whichever component hits its red line first — the *binding constraint*. Everything else has slack, and fixing a non-binding resource buys nothing. The binding resource is usually *not* the obvious one: disk I/O wait, not CPU; credit balance, not CPU; the photos-to-user ratio, not raw user count.
**7. The ceiling is empirical, not theoretical.** You cannot read the red line off a spec sheet. Real workloads mix read/write, hot/cold, burst/steady in ways no datasheet predicts. The only honest ceiling comes from measuring the component under *real production load* and watching where the user-facing symptom degrades.
**8. Saturation is felt in a user-facing unit.** Users do not experience "90% CPU." They experience a slow page, a timeout, a failed upload. The true ceiling is defined in the unit they feel — latency, time-to-serve, error rate — and a system metric can read healthy while that unit is already breached.
**9. Capacity is a moving target.** Load grows, audiences globalize, and the binding constraint itself migrates (speed up the disk and the network becomes the limit). Capacity work is a *loop* — measure, find the ceiling, forecast, act, re-measure — not a one-time calculation.
## 2. What the principles force (the bridge to metrics)
Each measurement rule is a *consequence* of a principle above, not a convention to memorize:
| Because (principle) | You must (metric consequence) |
|---|---|
| One resource binds first (6) | **Find the binding metric** — measure the resource that saturates first, not CPU by default |
| The ceiling is empirical (7) | **Measure under real load** — derive the red line from production, not the datasheet |
| Saturation is user-facing (8) | **Define the ceiling in latency/time-to-serve**, and distrust a healthy-looking system counter |
| The cost is non-linear, demand is a distribution (4) | **Pick the statistic by binding type** — percentile for peak-driven, average for accrual-driven; you fail in the tail |
| Capacity is a moving target (9) | **Re-confirm the binding metric after any change**, and evaluate on a rolling window |
That bridge is the whole argument: the metric choices are forced by the physics. Two people who accept §1 and apply the table above reach the same verdict.
## 3. The evaluation process
The operational layer — which metric per resource class, which statistic, coarse→fine sampling, the data-retention constraint, and instance-churn analysis — lives in the [[Metric Evaluation Runbook]]. For the platform-specific instantiation on AWS burstables, see [[EC2 Burstable Capacity from First Principles]] and the playbook [[EC2 Capacity Planning for Burstable Fleets]].
---
*Source: derived from queueing first principles (Little's Law `L = λW`; M/M/1 wait ∝ 1/(1−ρ)) and synthesized against [[First-Principles Capacity Analysis]] and the [[Metric Evaluation Runbook]].*