When backends start rejecting requests, clients self-regulate using a local probability formula:
```
rejection_probability = max(0, (requests - K * accepts) / (requests + 1))
```
Where `requests` = attempted in last 2 minutes, `accepts` = accepted by backend, and K is a tunable multiplier (default 2).
## Why Clients, Not Servers
Backends can become overloaded just from *rejecting* requests if rejection cost is non-trivial. Client-side throttling prevents rejected requests from even reaching the network. The decision is entirely local — no coordination overhead.
## The K Multiplier Trade-off
- **K=2** (default): ~1 rejected request per accepted. Wastes some backend resources but propagates state changes faster (backends can signal "accepting again" sooner).
- **K=1.1**: Only 1 rejected per 10 accepted. Protects backends where rejection cost ≈ processing cost.
## Cross-Domain Application
The same pattern — local probabilistic shedding based on recent accept/reject ratio — applies to:
- **Message queues**: Consumer backpressure when processing falls behind
- **API rate limiting**: Client libraries that back off based on 429 response ratio
- **Circuit breakers**: Hystrix/Resilience4j half-open state uses similar probabilistic probing