Capacity Planning Glossary - Nestor G Pestelos Jr (ngpestelos)

# Capacity Planning Glossary **Parent Topic**: [[Software/README]] Reference definitions for the terms used across the Capacity Planning notes. Each links to the canonical note where one exists. ## Core model - **Utilization (ρ)** — offered load ÷ capacity, `ρ = λ/μ`. The master variable; keep it below 1 for stability and below the *knee* (~0.7) for bounded latency. ([[First-Principles Capacity Analysis]]) - **Arrival rate (λ)** — the rate at which work arrives (requests/sec, queries/sec). - **Service rate (μ)** — the rate at which the system can process work. For burstable instances μ is *time-varying* (high while credits last, then the baseline). - **Binding constraint** — the resource that saturates first; total capacity equals the minimum across resources. Analyze it, not whichever metric is habitual. ([[Eliminate Healthy Resources to Find the Binding One]]) - **Ceiling / red line** — the critical level of a resource that cannot be crossed without failure; set below the true failure point with a safety margin. ([[Find Each Component's Red-Line Number]]) - **Headroom / safety factor** — the margin held back below the failure point to absorb spikes and reaction lag (e.g. an 85% CPU ceiling = 15% margin). ([[Apply a Safety Factor Above the Ceiling]], [[Maintain Capacity Headroom]]) - **Peak-driven resource** — elastic resources (compute, cache, DB) sized by their *peak*; load can be shed. ([[Peak-Driven Capacity Differs From Consumption-Driven]]) - **Consumption-driven resource** — monotonic resources (storage) sized by *run-out date*; can't shed what's stored. ## Statistics - **Average** — correct for *accrual* resources (credit economy over a window); hides peaks, so wrong for peak-driven sizing. - **Percentile (p95 / p99)** — the peak without the single-outlier noise; the right statistic for peak-driven resources. - **Max** — a tripwire only ("did it ever touch the ceiling"), never the headline number — one outlier distorts it. - **Sum** — used for credit-style metrics when the period exceeds their native publish interval. ## Burstable (T-family) - **CPU credit** — currency for bursting above baseline; 1 credit ≈ one vCPU at 100% for one minute. - **Baseline** — the sustainable utilization where credits neither accrue nor deplete; `= (credits-earned-per-hour ÷ vCPUs) ÷ 60`. ([[EC2 Burstable Baseline Utilization]]) - **Credit balance** — accrued unspent credits (capped at 24h of accrual); the honest saturation signal — pinned-low = chronically over baseline. ([[Burstable CPU Utilization Masks Saturation]]) - **Burn ratio** — credit usage ÷ per-period accrual; `>1` = spending faster than earning. ([[Credit Burn Ratio for Burstable Fleets]]) - **Standard vs unlimited mode** — *standard*: throttles to baseline when credits run out (no extra charge, surplus stays 0). *unlimited*: keeps bursting on borrowed surplus, billed beyond the 24h max (`CPUSurplusCreditsCharged`). Mode decides which signal is valid. ([[EC2 Burstable Instance Credit Model]]) - **Diagonal scaling** — vertically scaling your horizontally-scaled nodes: replace many old boxes with fewer denser ones. (Allspaw's coined term — [[Diagonal Scaling Upgrades Horizontal Nodes]]) ## Scaling levers - **Vertical scaling** — bigger single box; simple but cost rises steeply, single point of failure. - **Horizontal scaling** — more similar nodes; more failure points + sync overhead. - **Federation / sharding** — partitioning data across nodes so growth is bounded only by hardware, not one machine; lets you control a binding ratio. ([[Find the Application Metric That Predicts the Ceiling]]) - **Reserve** — committing to capacity (RI / Savings Plan) for steady baseline; changes $/μ, not μ — wrong for temporary spikes. ## Measurement & forecasting - **Observer effect** — measurement itself consumes resources and slightly distorts what it records. ([[Monitoring Itself Creates Load]]) - **Metric collection vs alerting** — collection records without acting (court reporter); alerting pages on urgent problems (smoke detector). Capacity work needs collection. ([[Metrics Collection Is Not Alerting]]) - **Sampling resolution / interval** — the granularity of stored data; choose it to illuminate the trend you forecast. ([[Match Metric Resolution to the Trend]]) - **Retention / down-aggregation** — old data is progressively rolled up to coarser resolution (CloudWatch: 1-min ~15d, 5-min ~63d, 1-hour ~455d); constrains how finely you can drill into old events. ([[RRD Trades Old Detail for Bounded Storage]]) - **Curve fitting / extrapolation** — fitting an equation to history to project forward; avoid >2nd-order polynomials; context beats R². ([[Don't Over-Fit Your Capacity Forecast]]) - **Moving window** — re-fitting the forecast on a rolling window sized to procurement lead time. ([[Recalibrate Forecasts on a Moving Window]]) ## Per-resource signals - **Disk I/O wait** — time the CPU waits on disk; predicts saturation better than disk *utilization* for I/O-bound workloads. ([[Disk IO Wait Predicts DB Lag Not Utilization]]) - **Working set** — the count of unique objects requested in a window; cache pays off when it fits. ([[Cache Only What Changes Slowly]]) - **Hit ratio** — fraction of cache requests served from cache; the ceiling signal when the working set overflows. - **LRU reference age** — age of the oldest object on the least-recently-used list; a cache-efficiency indicator. ([[Cache Ceilings Use Hit Ratio Not Just Request Rate]]) - **Time-to-serve / latency** — the user-facing ceiling; often breaches before any system metric flags. ([[Define Ceilings by User-Facing Time Not System Metrics]]) - **Goodput** — useful throughput (vs raw throughput) under overload. ([[Goodput as Capacity Truth-Teller]]) ## Operations & economics - **SLA / nines** — an availability commitment with credits/penalties; "five-nines" = 99.999% (~5 min/year downtime). ([[SLA Nines Translate to Downtime Budgets]]) - **Procurement pipeline / lead time** — the time to justify → order → install → test → deploy capacity; work backward from run-out. ([[Work Backward From Run-Out Using Procurement Time]]) - **Just-in-time (JIT)** — buy capacity only as needed; idle servers waste money and depreciate (Moore's Law). ([[Don't Buy Capacity Before You Need It]]) - **Synthetic monitoring** — external scripted requests measuring availability/latency; trustworthy only if they request pages the way real users do. ([[Interpret Synthetic Monitoring Before Trusting It]]) - **Second-order effect** — relieving one bottleneck relocates the traffic jam elsewhere (e.g. caching → faster clicks → web load). ([[Adding Capacity Moves the Bottleneck]]) --- *Source: synthesized from [[The Art of Capacity Planning]] (John Allspaw, O'Reilly 2008) and the Capacity Planning notes; burstable + CloudWatch specifics per AWS documentation.*