How to learn after an incident

This resource first appeared in issue #107 on 29 Jan 2022 and has tags Technical Leadership: Systems: Incident Handling

Howie: The Post-Incident Guide - Jeli How to Write Meaningful Retrospectives - Emily Arnott, Blameless

The key to getting better, individually or as a team, is to pay attention to how things go, and continue doing the things that lead to good results, while changing things that lead to bad results.

Pretty simple, right? And yet we really don’t like to do this.

Whether your teams run systems, develop software, curate data resources, or combinations of the three, sometimes things are going to go really badly, in a way that affects researchers. That is always going to happen. The key to reducing the frequency and severity of those bad outcomes while continuing to build trust with researchers is to learn from what happened (post-incident analysis) and communicate to the user community (sending out retrospectives).

In the first article, the Jeli team recommends running the post-incident analysis by:

  • Assigning a specific named person to lead the investigation
  • Identifying and analyzing incident data
  • Interviewing participants
  • Calibrating analysis
  • Consolidating into an analysis
  • Meeting to review what’s been learned, and
  • Reporting back and Distributing

It’s a good long read going into each of those steps in great detail.

In the second article, Arnott focusses especially on the writing and communication to stakeholders. Your stakeholders are researchers - they’re smart, they know things go wrong on the cutting edge sometimes, and they have extremely finely-tuned BS detectors. They deserve to know what’s gone wrong and what you’re doing to reduce the chances of that going wrong again.

<<<<<<< HEAD
======= >>>>>>> c1d069a... First pass at category pages