## Core Insight Blameless postmortems focus on systemic causes, not individual blame. The fundamental principle: you can't fix people, but you can fix systems and processes to better support people making the right choices. If a culture of finger-pointing prevails, people won't surface issues for fear of punishment — leading to greater organizational risk. ## What a Postmortem Contains Written record of: incident description, impact, actions taken to mitigate/resolve, root cause(s), and follow-up actions to prevent recurrence. ## When to Write One Define triggers before incidents occur: - User-visible downtime/degradation beyond threshold - Data loss of any kind - On-call intervention (rollback, rerouting) - Resolution time above threshold - Monitoring failure (manual discovery) - Any stakeholder request ## The Blameless Principle - Assumes everyone had good intentions and did their best with available information - Originated in healthcare and avionics where mistakes are fatal - Shifts from "who did wrong" to "why did this person have incomplete/incorrect information" - Writing a postmortem is not punishment — it's a learning opportunity ## Cultivating the Culture - **Postmortem of the month** newsletter - **Reading clubs** — review old postmortems with open dialogue - **Wheel of Misfortune** — role-play reenactments of past incidents - **Visible rewards** — peer bonuses, public recognition from leadership - **No postmortem left unreviewed** — regular review sessions to close out discussions - Survey teams on effectiveness; iterate on the process itself ## Review Criteria - Incident data collected for posterity? - Impact assessments complete? - Root cause sufficiently deep? - Action plan appropriate with proper priority? - Outcome shared with relevant stakeholders? ## Source - [[Site Reliability Engineering - Chapter 15 - Postmortem Culture|SRE Ch 15: Postmortem Culture]] by John Lunney and Sue Lueder ## Related Concepts - [[Incident Command System for SRE]] - [[Proactive Failure Testing Culture]] - [[Hypothetico-Deductive Troubleshooting Method]] - [[Learning from Spectacular Failures - Kpaxs 20250404]] — Aviation's public failures forced rigorous learning