The error budget isn't just a concept — it's a control loop that governs release velocity. This is the mechanism that makes [[Error Budget Model]] operational. ## The Control Loop 1. Product management defines SLO (quarterly availability target) 2. Monitoring measures actual availability (neutral third party) 3. Gap between SLO and actual = remaining error budget 4. Budget remaining → releases continue 5. Budget depleted → releases halt, invest in resilience ## Why It Resolves Politics Before error budgets, dev/ops risk negotiation was a function of who argued louder. The error budget replaces negotiation with data: - Product devs see the budget and self-police their own risk - When budget is large, they take more risks (skip some testing, push faster) - When budget is nearly drained, they push for more testing themselves - SRE team must have authority to stop launches if SLO is broken ## Subtler Than On/Off Simple version: binary halt/go. Better version: slow down releases or roll back when budget approaches exhaustion. This is "bang/bang control" vs proportional control. ## Shared Pain Network outages and datacenter failures consume the same budget as bad pushes. This means: - Everyone shares responsibility for uptime - External failures reduce push capacity for the rest of the quarter - No finger-pointing — same budget, same team ## The Escape Valve If the team can't ship features fast enough, they can loosen the SLO (increase budget). This highlights the cost of overly high reliability targets in terms of inflexibility and slow innovation. The budget makes this trade-off visible and explicit. ## Cross-Domain Connections - [[Error Budget Model]] — the concept this mechanism operationalizes - [[SRE Change Management Seventy Percent Rule]] — 70% of outages from changes; the budget absorbs this reality - [[Risk as a Reliability Continuum]] — the budget is how you navigate the continuum --- *Source: Site Reliability Engineering, Chapter 3 (Alvidrez/Roth, 2016)* *Extracted: 2026-03-26*