When backends start rejecting requests, clients self-regulate using a local probability formula: ``` rejection_probability = max(0, (requests - K * accepts) / (requests + 1)) ``` Where `requests` = attempted in last 2 minutes, `accepts` = accepted by backend, and K is a tunable multiplier (default 2). ## Why Clients, Not Servers Backends can become overloaded just from *rejecting* requests if rejection cost is non-trivial. Client-side throttling prevents rejected requests from even reaching the network. The decision is entirely local — no coordination overhead. ## The K Multiplier Trade-off - **K=2** (default): ~1 rejected request per accepted. Wastes some backend resources but propagates state changes faster (backends can signal "accepting again" sooner). - **K=1.1**: Only 1 rejected per 10 accepted. Protects backends where rejection cost ≈ processing cost. ## Cross-Domain Application The same pattern — local probabilistic shedding based on recent accept/reject ratio — applies to: - **Message queues**: Consumer backpressure when processing falls behind - **API rate limiting**: Client libraries that back off based on 429 response ratio - **Circuit breakers**: Hystrix/Resilience4j half-open state uses similar probabilistic probing