**Parent Topic**: [[Software/README]] ## Test Live, in the Wild dealnews believed "the only way to make sure our production systems could handle the load was to test them, live, during the day when traffic was flowing" (using Keynote, which hits a site from multiple datacenters and ramps requests, with a kill switch if things go bad). People look shocked — "but how would we know if it worked otherwise?" Benchmarks along the way are "no substitute for a real test." ## The CDN-Masking Pitfall Their first test "did not tell us much": configured to fetch pages *and all objects*, it just load-tested the CDN. Because one page request generated **10–200 requests to the CDN**, the origin servers "never felt any load." Fix: exclude CDN objects so the test focuses on your own servers. ## Ramp and Probe the Floor They ramped to **600 concurrent connections** (the Yahoo!-event peak) — nothing moved. Ramping to **3,000** surfaced a real problem: the proprietary ad-server software, long their most solid, had become the slowest piece (file-I/O bottlenecks, moved into memcached). Then, at the midpoint, they **turned the cache off** to measure the floor: response time doubled to ~1.2 seconds but the system held 3,000 connections straight from the database — proving [[Architect to Survive Cache Failure]]. Production load testing is how you find the bottleneck before your users do. --- *Source: [[Web Operations]] (Allspaw & Robbins, O'Reilly 2010) — Ch 9 — Dealing with Unexpected Traffic Spikes*