"Every performance problem starts with a queue backing up somewhere" — a socket listen queue, the OS run queue, a database I/O queue. An **unbounded** queue consumes all memory, and by Little's law, as its length heads toward infinity so does response time. So queues must be **finite** for response times to be finite.
But a bounded, full queue forces a choice when a producer pushes one more item. The only options: pretend to accept it but drop it, accept it and drop something older, refuse it, or **block the producer**. Blocking is flow control — it applies **back pressure** that propagates upstream, throttling the ultimate client until the queue drains. TCP does exactly this with its window: once full, the sender can't send, its transmit buffers fill, and `write()` blocks.
Nygard's example: an API server allowed 100 simultaneous calls to its storage engine; the 101st call's thread blocks until a slot frees — that blocking *is* the back pressure, so the server can't outrun the engine.
Constraints: back pressure works **within a system boundary** and only when the consumer pool is **finite** (a diverse Internet "upstream" has no systemic throttle). Since it inevitably blocks threads — "a quick path to downtime" — at the edges you need load shedding and async calls instead (accept on one thread pool, call out on another, time out to a 503 or queue to a 202). Distinguish temporary back pressure from a genuinely broken consumer, and alert monitoring when it kicks in. "The only alternative is to let them crash the provider."
---
*Source: [[Release It Second Edition]] (Michael T. Nygard, Pragmatic Bookshelf 2018) — Ch 5 — Stability Patterns*