OOPS writeups - Lorin Hochstein

This resource first appeared in issue #103 on 04 Dec 2021 and has tags Technical Leadership: Systems: Incident Handling, Technical Leadership: Systems: Other

Hochstein gives the outline and an explanation as to how his team in Netflix write up “OOPS” reports, essentially incidents that didn’t rise to the level of Incident Response, as a way of learning and sharing knowledge about things that can go wrong in their systems. It’s a nice article and provides a light-weight model to potentially use.

His outline, blasted verbatim from the article, is below. I particularly like the sections on contributors/enablers and Mitigators as things that didn’t cause the issue but made it better or worse than it would otherwise have been. If this is of interest to you, or you’re thinking of starting some way of formally writing up “events” that happen in your systems, the article is a short and interesting read.

Title
Executive summary
Background
Narrative description
- Prologue
- The trigger
- Impact
- Epilogue
Contributors/enablers
Mitigators
Risks
Challenges in handling