Knowing when each piece of infrastructure will fail is mandatory, not optional. For every component, find its capacity ceiling — e.g. how many queries per second a specific database configuration sustains before performance degradation reaches the end user. Adjust for periodic spikes, subtract a comfortable percentage of headroom (the safety factor), and you get a single **"red line" number** that characterizes that component in its role.
That number tells you three things:
- where to set alert thresholds,
- what adding or removing a similar node actually buys you,
- when to start sizing the next order of capacity.
Finding it requires an **easily segmented architecture** — you can only measure a limit you can isolate. This is the operational, per-component form of the binding-constraint idea: capacity is set by the first resource to hit its red line.
---
*Source: [[The Art of Capacity Planning]] (John Allspaw, O'Reilly 2008) — Ch 1 — Goals, Issues, and Processes in Capacity Planning*