**Parent Topic**: [[Software/README]]
## The "Holy Grail"
Huff calls auto-scaling "the Holy Grail of cloud operations." Picnik's version is a simple control loop because everything flows through a render queue, and users are blocked waiting on results — so the goal is simply to **keep the queue empty**.
## The Loop
Every minute a ServerManager thread wakes, polls queue stats (averaged over the last minute), and computes what is needed to maintain a **target ratio of free to busy workers**. Three refinements make it stable:
- **Hysteresis** to prevent oscillation around the target from small traffic and latency fluctuations.
- **Startup-lag awareness** — an EC2 instance can take several minutes to boot, so the loop accounts for in-flight launches.
- **Empirical tuning** over a week or two, not derived from theory.
## Beyond Capacity
Auto-scaling also absorbs *problems*: when latency rose or a release slowed rendering, Picnik "auto-scaled out of" the issue until the root cause was fixed (e.g. a bug fix that raised render load 20% right before Christmas). Built on [[Decoupled Stateless Components Are Cloud-Ready]].
---
*Source: [[Web Operations]] (Allspaw & Robbins, O'Reilly 2010) — Ch 2 — How Picnik Uses Cloud Computing: Lessons Learned*