## Core Insight
The four golden signals — latency, traffic, errors, saturation — are the minimum viable monitoring for any user-facing system. If you can only measure four things, measure these. Together they cover user experience (latency), demand (traffic), correctness (errors), and capacity (saturation).
## The Four Signals
1. **Latency** — time to service a request. Must separate successful from failed request latency (a slow error is worse than a fast error)
2. **Traffic** — demand on the system. HTTP requests/sec for web services, network I/O for streaming, transactions/sec for storage
3. **Errors** — rate of failed requests. Three types: explicit (HTTP 500), implicit (200 with wrong content), policy-based (response exceeding SLO threshold)
4. **Saturation** — how "full" the service is. Systems degrade before 100% utilization. Latency increases are often a **leading indicator** of saturation
## Key Nuances
- **Tail latency matters more than averages**: 1% of requests at 50x the average can dominate frontend experience in multi-service architectures. The 99th percentile of one backend becomes the median of the frontend.
- **Use histogram buckets, not averages**: distribute boundaries exponentially (e.g., 0-10ms, 10-30ms, 30-100ms, 100-300ms) to visualize request distribution
- **Saturation includes predictions**: "database fills hard drive in 4 hours" is a saturation signal
## Cross-Domain Applications
- **Any service**: these four signals transfer to non-Google systems — they're fundamental, not Google-specific
- **Capacity planning**: saturation signals feed directly into provisioning decisions
- **Incident response**: start diagnosis by checking which golden signal is anomalous
## Source
- [[Site Reliability Engineering - Chapter 6 - Monitoring Distributed Systems|SRE Ch 6: Monitoring Distributed Systems]] by Rob Ewaschuk
## Related Concepts
- [[Symptom vs Cause Monitoring Distinction]]
- [[Service Level Objectives as Reliability Framework]]
- [[Strategic Short-Term Availability Trade-offs]]