Local Visibility Causes Local Optimization - Nestor G Pestelos Jr (ngpestelos)

"Strictly local visibility leads to strictly local optimization." Seeing inside one application or server is not enough; system-wide transparency is what reveals the problems that matter. Two vivid examples: - **The batch-job optimization that did nothing.** A retailer wanted items to appear on the site sooner — the nightly update ran until 5 or 6 a.m. when it needed to finish near midnight. The project optimized the string of batch jobs and met its goal: the jobs finished *two hours earlier*. But items still didn't appear until a *long-running parallel process* finished at 5 or 6 a.m. The local optimization had **no global effect** because the team only had visibility into the batch jobs. - **The cache-flush storm.** Watching cache flushes on *one* application server wouldn't reveal that *each* server was knocking items out of *every other server's* cache — every display accidentally updated the item, firing an invalidation to all servers. The problem was invisible per-instance; "as soon as all the caches' statistics appeared on one page, the problem was obvious." Without that view, they'd have *added servers* to reach capacity — and each new server would have made it worse. The lesson: optimize against a **system-wide** picture. Per-instance dashboards both hide cross-instance scaling effects and tempt you into improving a component that isn't the global bottleneck. --- *Source: [[Release It Second Edition]] (Michael T. Nygard, Pragmatic Bookshelf 2018) — Ch 8 — Processes on Machines*