Nonlinear Scaling Over Ops Mode - Nestor G Pestelos Jr (ngpestelos)

**Parent Topic**: [[Software/README]] **"Ops mode"** is keeping a service running by adding *people* as it grows: more VMs → more administrators to manage them. The defining SRE alternative is to **write software or eliminate scalability concerns so headcount doesn't grow as a function of load** — humans are added only as *complexity* increases, never just because traffic did. The most important reframe for an embedded SRE: **more tickets should not require more SREs.** A team sliding into ops mode optimizes how to *address* emergencies faster, rather than how to *reduce the number* of emergencies. Escaping ops mode means shifting focus from response to prevention. Two practical moves: - **Test the "my service is tiny, scale doesn't matter" claim** by shadowing an on-call session. If the service is genuinely small but important, focus on what blocks reliability improvements; if it's just starting, prepare for explosive growth — a 100 req/s service can become 10k req/s in a year. - **Prioritize outages by their stress impact on the team**, not just technical severity. Due to history and perspective, some very small problems generate inordinate stress; relieving them frees the team to do prevention work.