This resource first appeared in issue #97 on 22 Oct 2021 and has tags Technical Leadership: Systems: Incident Handling, Technical Leadership: Systems: Other
Incident Review and Postmortem Best Practices - Gergely Orosz
If your team is thinking of starting incident reviews & postmortems - which I recommend if relevant to your work - this is a good place to start. Orosz reports on a survey and discussions with 60+ teams doing incident responses, and finds that most have a pretty common pattern:
Current best practices seem to be:
He then goes into some details of conversations with teams that are going beyond best practices - companies like Honeycomb who, providing tracing for other team’s stacks, have very high uptime requirements (they publicly released an outage report for a 5 min outage!) amongst others.
A long article but worth a read.