**Author**: John Allspaw
**Publisher**: O'Reilly Media, First Edition — September 2008 (© Yahoo! Inc.)
**Pages**: 154 (PDF); ~131 content pages + 3 appendices
**ISBN-13**: 978-0-596-51857-8
%%
**PDF**: `~/Library/CloudStorage/
[email protected]/My Drive/Books/The Art of Capacity Planning.pdf`
**Page offset**: book page 1 = PDF page 17 (offset **+16**)
**Topic folder**: `2 Resources/Topics/Software/Engineering/Capacity Planning/`
**Status**: ✅ COMPLETE — 5 chapters + 3 appendices / 68 notes
**Added**: 2026-06-18
**Completed**: 2026-06-18
**Last activity**: 2026-06-18
%%
## Progressive Summary
*The Art of Capacity Planning* (John Allspaw, O'Reilly 2008) argues capacity planning is an **empirical, iterative discipline, not a theoretical one**: measure your own system's real usage, find each component's ceiling, forecast from history, deploy ahead of growth — then repeat. Written at Flickr/Yahoo! (it opens with the July 2005 London-bombing traffic spike that nearly took Flickr down), it predates and seeds the DevOps/SRE literature, and is the direct predecessor to Allspaw's *Web Operations* (2010).
Five chapters trace one loop. **Set goals** (requirements, SLAs, the performance-vs-capacity distinction) → **measure** (the ceiling-finding method: tie a primary-function metric to a hardware resource, find the red line via real production load) → **predict** (curve-fit history to the ceiling, separating peak-driven from consumption-driven resources, with a safety margin) → **deploy** (automation to shrink the one provisioning phase you control). Three appendices extend the loop to virtualization/cloud (same process at finer granularity, with cost as a new variable) and to instantaneous-growth firefighting (pre-built load-shedding switches, baking pages, serving stale).
The single most consequential move is the **ceilings + history → forecast** loop grounded in measuring the *right* metric — and the repeated discovery that the right metric is non-obvious: disk **I/O wait** not utilization; the **photos-to-user ratio** not raw counts; **credit balance** not CPU for burstables. Read in 90 seconds six months from now: measure empirically, find each component's red line in the unit users actually feel, forecast peaks and consumption to that line with a safety margin, automate deployment, and re-run the loop forever — capacity is a process, not an event.
## Top Takeaways for Rereading
**Process & model**
- [[The Four-Step Capacity Planning Cycle]] — measure → predict → deploy → iterate
- [[Forecasting Requires Ceilings Plus History]] — the two essential inputs
- [[The Ceiling-Finding Method]] — the repeatable per-component procedure
**Measurement**
- [[Eliminate Healthy Resources to Find the Binding One]] — rule out the comfortable; find the constraint
- [[Disk IO Wait Predicts DB Lag Not Utilization]] — the obvious gauge can lie
- [[Find the Application Metric That Predicts the Ceiling]] — the predictor is often a derived ratio
**Forecasting**
- [[Forecast Peak-Driven Resources by Trending Peaks]] — trend the peaks, not the average
- [[Apply a Safety Factor Above the Ceiling]] — borrow the structural-engineering margin
- [[Don't Over-Fit Your Capacity Forecast]] — context beats R²
**Economics & operations**
- [[Don't Buy Capacity Before You Need It]] — JIT + Moore's Law
- [[Work Backward From Run-Out Using Procurement Time]] — schedule from the run-out date
- [[Pre-Build Load-Shedding Switches]] — the cheapest emergency capacity is spend you can stop
**Cross-cutting**
- [[Peak-Driven Capacity Differs From Consumption-Driven]] — peak envelope vs depletion timeline
- [[The Defining Metric Itself Can Change]] — re-confirm the binding metric after any change
## About
Allspaw's foundational text on **web capacity planning** — written at Flickr/Yahoo! and opening with the July 2005 London-bombing traffic spike that nearly took Flickr down. Its core argument: capacity planning is **empirical, not theoretical** — measure your own system's real usage, find each resource's ceiling, predict trends from history, and buy/deploy ahead of the curve. "Back-of-the-envelope" math and honest measurement beat elaborate simulation.
This is the **direct predecessor** to Allspaw's later [[Book Inventory/Web Operations|Web Operations]] (2010) and a primary source the SRE-era literature draws on. Five chapters + three appendices:
- **Ch 1** — Goals, Issues, and Processes in Capacity Planning
- **Ch 2** — Setting Goals for Capacity
- **Ch 3** — Measurement: Units of Capacity (the longest, most technical chapter)
- **Ch 4** — Predicting Trends
- **Ch 5** — Deployment
- **App A** — Virtualization and Cloud Computing
- **App B** — Dealing with Instantaneous Growth
- **App C** — Capacity Tools
## Cross-Corpus Note (record)
This book originated much of a corpus the vault already held in derived form ([[Book Inventory/Web Operations|Web Operations]], [[Book Inventory/Site Reliability Engineering|Site Reliability Engineering]], Release It). Synthesis applied **primary-beats-derivative**: captured Allspaw's canonical version where this 2008 book is the origin, cross-linked rather than duplicated where Web Ops/SRE/Release It already covered it (notably Ch 5 Deployment, ~70% pre-covered, and App A/B). App C (2008 tool catalog) yielded no notes by the dated-survey rule.
## Synthesis Progress
| Chapter | Title | Pages (book) | Status | Atomic Notes |
| ------- | ----- | ------------ | ------ | ------------ |
| Ch 1 | Goals, Issues, and Processes in Capacity Planning | 1–10 | Complete (9 notes) | [[The Four-Step Capacity Planning Cycle]], [[Find Each Component's Red-Line Number]], [[Tie System Stats to Business Metrics]], [[Procurement Is a Capacity-Planning Step]], [[Accept Current Performance as the Planning Baseline]], [[Capacity Planning Is Empirical Not Theoretical]], [[Architecture Affects Capacity More Than Tuning]], [[User-Generated Content Makes Growth Unpredictable]], [[Use Quick-and-Dirty Capacity Math]] |
| Ch 2 | Setting Goals for Capacity | 11–22 | Complete (11 notes) | [[Define Requirements Before Planning Capacity]], [[Interpret Synthetic Monitoring Before Trusting It]], [[SLA Nines Translate to Downtime Budgets]], [[Downtime Does Not Equal Linearly Lost Revenue]], [[API Capacity Is a Business Contract]], [[Slow Pages Are Not Always a Capacity Problem]], [[Define Ceilings by User-Facing Time Not System Metrics]], [[Split Architecture into Measurable Components]], [[Match Hardware Profile to Each Role's Bound Resource]], [[Diagonal Scaling Upgrades Horizontal Nodes]], [[Disaster Recovery Multiplies Capacity Cost]] |
| Ch 3 | Measurement: Units of Capacity | 23–62 | Complete (19 notes) | [[Choose Measurement Tools by Capability Not Brand]], [[RRD Trades Old Detail for Bounded Storage]], [[Treat Logs as Past Metrics]], [[Networks Are a Finite Capacity Too]], [[Least-Connections Balancing Breaks for Databases]], [[Load Balancers Are Capacity Instruments]], [[The Ceiling-Finding Method]], [[Eliminate Healthy Resources to Find the Binding One]], [[Test Ceilings with Real Production Load]], [[Single-Machine Load Testing Has Limits]], [[Storage Capacity Has Two Dimensions]], [[Peak-Driven Capacity Differs From Consumption-Driven]], [[Match Metric Resolution to the Trend]], [[Disk IO Wait Predicts DB Lag Not Utilization]], [[Database Ceilings Have Hidden Cliffs]], [[Cache Only What Changes Slowly]], [[Cache Ceilings Use Hit Ratio Not Just Request Rate]], [[Isolate Resource Use on Multi-Use Servers]], [[Measure API Usage Per Key]] |
| Ch 4 | Predicting Trends | 63–92 | Complete (16 notes) | [[Forecasting Requires Ceilings Plus History]], [[Don't Buy Capacity Before You Need It]], [[Don't Over-Fit Your Capacity Forecast]], [[Forecast Run-Out by Trending to the Ceiling]], [[Find the Application Metric That Predicts the Ceiling]], [[Forecast Peak-Driven Resources by Trending Peaks]], [[Small Data Sets Make Forecasts Fragile]], [[Automate Curve-Fitting into a Recurring Job]], [[Apply a Safety Factor Above the Ceiling]], [[Work Backward From Run-Out Using Procurement Time]], [[Adding Capacity Moves the Bottleneck]], [[Traffic Patterns Widen as Audience Globalizes]], [[Size Each Data Center for Its Partner's Full Load]], [[Capacity Planning Must Talk to Product]], [[Recalibrate Forecasts on a Moving Window]], [[The Defining Metric Itself Can Change]] |
| Ch 5 | Deployment | 93–104 | Complete (5 notes; ~70% dedup vs Web Ops/Release It deployment corpus) | [[Automation Shrinks the Provisioning Time You Control]], [[Homogenize Hardware Types]], [[Compose Roles from Reusable Services]], [[Drive Deployment from Inventory as Source of Truth]], [[Bring Up a Data Center from Bare Metal via Automation]] |
| App A | Virtualization and Cloud Computing | 105–120 | Complete (4 notes) | [[Treat Cloud Capacity as the Same Process]], [[Forecast Even When Deployment Is Instant]], [[Cloud Cost Is a Capacity Variable]], [[Drive Cloud Autoscaling with a Capacity Feedback Loop]] |
| App B | Dealing with Instantaneous Growth | 121–126 | Complete (4 notes) | [[Adding Servers Can't Fix Architectural Limits]], [[Pre-Build Load-Shedding Switches]], [[Bake Pages or Serve Stale Under Load]], [[Host Your Status Page Outside Your Data Center]] |
| App C | Capacity Tools | 127–130 | Complete (0 notes — 2008 tool catalog; durable tool concepts already captured in Ch 3/Ch 5 per the dated-survey dedup rule) | — |
**Final tally:** 68 synthesis notes (Ch 1–5: 60, App A–B: 8; App C none by design). All 9/9 lint.