Services, APIs, and websites "have zero control over their demand" — at any moment a billion-plus devices could call you. "No matter how strong your load balancers or how fast you can scale, the world can always make more load than you can handle."
At the network level TCP already sheds load: incomplete connections sit in a per-port listen queue, and when it's full new attempts get an ICMP RST. But services usually fall over *before* the queue fills — almost always from **contention for a pooled resource**: threads slow down waiting, the extra threads burn RAM and CPU, response times lengthen until callers time out, and "to an outside observer, there's no difference between 'really, really slow' and 'down.'"
So **model TCP**: when load gets too high, start refusing new work (closely related to Fail Fast). Define "too high" ideally by monitoring your own performance against the SLA; failing that, hold a **semaphore** capping concurrent requests, or return **503** to the load balancer so it backs off and recycles connections.
Scope matters: **shed load at the *edges* of your system** (where the uncontrolled Internet is your user base); *inside* a boundary, prefer back pressure to balance throughput, using load shedding only as a secondary measure. Creating slow responses is "being a bad citizen" — shedding load keeps you a good one.
---
*Source: [[Release It Second Edition]] (Michael T. Nygard, Pragmatic Bookshelf 2018) — Ch 5 — Stability Patterns*