# Capacity Planning Glossary
**Parent Topic**: [[Software/README]]
Reference definitions for the terms used across the Capacity Planning notes. Each links to the canonical note where one exists.
## Core model
- **Utilization (ρ)** — offered load ÷ capacity, `ρ = λ/μ`. The master variable; keep it below 1 for stability and below the *knee* (~0.7) for bounded latency. ([[First-Principles Capacity Analysis]])
- **Arrival rate (λ)** — the rate at which work arrives (requests/sec, queries/sec).
- **Service rate (μ)** — the rate at which the system can process work. For burstable instances μ is *time-varying* (high while credits last, then the baseline).
- **Binding constraint** — the resource that saturates first; total capacity equals the minimum across resources. Analyze it, not whichever metric is habitual. ([[Eliminate Healthy Resources to Find the Binding One]])
- **Ceiling / red line** — the critical level of a resource that cannot be crossed without failure; set below the true failure point with a safety margin. ([[Find Each Component's Red-Line Number]])
- **Headroom / safety factor** — the margin held back below the failure point to absorb spikes and reaction lag (e.g. an 85% CPU ceiling = 15% margin). ([[Apply a Safety Factor Above the Ceiling]], [[Maintain Capacity Headroom]])
- **Peak-driven resource** — elastic resources (compute, cache, DB) sized by their *peak*; load can be shed. ([[Peak-Driven Capacity Differs From Consumption-Driven]])
- **Consumption-driven resource** — monotonic resources (storage) sized by *run-out date*; can't shed what's stored.
## Statistics
- **Average** — correct for *accrual* resources (credit economy over a window); hides peaks, so wrong for peak-driven sizing.
- **Percentile (p95 / p99)** — the peak without the single-outlier noise; the right statistic for peak-driven resources.
- **Max** — a tripwire only ("did it ever touch the ceiling"), never the headline number — one outlier distorts it.
- **Sum** — used for credit-style metrics when the period exceeds their native publish interval.
## Burstable (T-family)
- **CPU credit** — currency for bursting above baseline; 1 credit ≈ one vCPU at 100% for one minute.
- **Baseline** — the sustainable utilization where credits neither accrue nor deplete; `= (credits-earned-per-hour ÷ vCPUs) ÷ 60`. ([[EC2 Burstable Baseline Utilization]])
- **Credit balance** — accrued unspent credits (capped at 24h of accrual); the honest saturation signal — pinned-low = chronically over baseline. ([[Burstable CPU Utilization Masks Saturation]])
- **Burn ratio** — credit usage ÷ per-period accrual; `>1` = spending faster than earning. ([[Credit Burn Ratio for Burstable Fleets]])
- **Standard vs unlimited mode** — *standard*: throttles to baseline when credits run out (no extra charge, surplus stays 0). *unlimited*: keeps bursting on borrowed surplus, billed beyond the 24h max (`CPUSurplusCreditsCharged`). Mode decides which signal is valid. ([[EC2 Burstable Instance Credit Model]])
- **Diagonal scaling** — vertically scaling your horizontally-scaled nodes: replace many old boxes with fewer denser ones. (Allspaw's coined term — [[Diagonal Scaling Upgrades Horizontal Nodes]])
## Scaling levers
- **Vertical scaling** — bigger single box; simple but cost rises steeply, single point of failure.
- **Horizontal scaling** — more similar nodes; more failure points + sync overhead.
- **Federation / sharding** — partitioning data across nodes so growth is bounded only by hardware, not one machine; lets you control a binding ratio. ([[Find the Application Metric That Predicts the Ceiling]])
- **Reserve** — committing to capacity (RI / Savings Plan) for steady baseline; changes $/μ, not μ — wrong for temporary spikes.
## Measurement & forecasting
- **Observer effect** — measurement itself consumes resources and slightly distorts what it records. ([[Monitoring Itself Creates Load]])
- **Metric collection vs alerting** — collection records without acting (court reporter); alerting pages on urgent problems (smoke detector). Capacity work needs collection. ([[Metrics Collection Is Not Alerting]])
- **Sampling resolution / interval** — the granularity of stored data; choose it to illuminate the trend you forecast. ([[Match Metric Resolution to the Trend]])
- **Retention / down-aggregation** — old data is progressively rolled up to coarser resolution (CloudWatch: 1-min ~15d, 5-min ~63d, 1-hour ~455d); constrains how finely you can drill into old events. ([[RRD Trades Old Detail for Bounded Storage]])
- **Curve fitting / extrapolation** — fitting an equation to history to project forward; avoid >2nd-order polynomials; context beats R². ([[Don't Over-Fit Your Capacity Forecast]])
- **Moving window** — re-fitting the forecast on a rolling window sized to procurement lead time. ([[Recalibrate Forecasts on a Moving Window]])
## Per-resource signals
- **Disk I/O wait** — time the CPU waits on disk; predicts saturation better than disk *utilization* for I/O-bound workloads. ([[Disk IO Wait Predicts DB Lag Not Utilization]])
- **Working set** — the count of unique objects requested in a window; cache pays off when it fits. ([[Cache Only What Changes Slowly]])
- **Hit ratio** — fraction of cache requests served from cache; the ceiling signal when the working set overflows.
- **LRU reference age** — age of the oldest object on the least-recently-used list; a cache-efficiency indicator. ([[Cache Ceilings Use Hit Ratio Not Just Request Rate]])
- **Time-to-serve / latency** — the user-facing ceiling; often breaches before any system metric flags. ([[Define Ceilings by User-Facing Time Not System Metrics]])
- **Goodput** — useful throughput (vs raw throughput) under overload. ([[Goodput as Capacity Truth-Teller]])
## Operations & economics
- **SLA / nines** — an availability commitment with credits/penalties; "five-nines" = 99.999% (~5 min/year downtime). ([[SLA Nines Translate to Downtime Budgets]])
- **Procurement pipeline / lead time** — the time to justify → order → install → test → deploy capacity; work backward from run-out. ([[Work Backward From Run-Out Using Procurement Time]])
- **Just-in-time (JIT)** — buy capacity only as needed; idle servers waste money and depreciate (Moore's Law). ([[Don't Buy Capacity Before You Need It]])
- **Synthetic monitoring** — external scripted requests measuring availability/latency; trustworthy only if they request pages the way real users do. ([[Interpret Synthetic Monitoring Before Trusting It]])
- **Second-order effect** — relieving one bottleneck relocates the traffic jam elsewhere (e.g. caching → faster clicks → web load). ([[Adding Capacity Moves the Bottleneck]])
---
*Source: synthesized from [[The Art of Capacity Planning]] (John Allspaw, O'Reilly 2008) and the Capacity Planning notes; burstable + CloudWatch specifics per AWS documentation.*