Pre-Build Load-Shedding Switches - Nestor G Pestelos Jr (ngpestelos)

The cheapest capacity under duress is capacity you stop spending. Build **on/off switches** for heavy features *in advance* — a one-line config flag beats hardcoded behavior you must hunt down mid-incident. Flickr kept **195 disable-able features** (photo uploads, all search, inter-user mail), agreed across product/dev/design/ops so the degrade list isn't improvised. Example: launching localized Flickr in seven languages, an IP-based geo-lookup immediately overloaded; flipping it off bought time to find the real cause (an over-conservative request throttle), fix it, and re-enable. A hardcoded feature would have meant a degraded or down site for the whole diagnosis. The principle extends to *any* non-essential subsystem. In the 1996 U.S. election, a news site drowning in traffic with no spare servers chose to **stop logging** — sacrificing traffic metrics for hours — which relieved the disks enough to keep serving. Degrading to a reduced feature set beats a full outage; pre-build the switches that let you choose. --- *Source: [[The Art of Capacity Planning]] (John Allspaw, O'Reilly 2008) — App B — Dealing with Instantaneous Growth*