Knowing when each piece of infrastructure will fail is mandatory, not optional. For every component, find its capacity ceiling — e.g. how many queries per second a specific database configuration sustains before performance degradation reaches the end user. Adjust for periodic spikes, subtract a comfortable percentage of headroom (the safety factor), and you get a single **"red line" number** that characterizes that component in its role. That number tells you three things: - where to set alert thresholds, - what adding or removing a similar node actually buys you, - when to start sizing the next order of capacity. Finding it requires an **easily segmented architecture** — you can only measure a limit you can isolate. This is the operational, per-component form of the binding-constraint idea: capacity is set by the first resource to hit its red line. --- *Source: [[The Art of Capacity Planning]] (John Allspaw, O'Reilly 2008) — Ch 1 — Goals, Issues, and Processes in Capacity Planning*