Making operational work more visible

This resource first appeared in issue #120 on 30 Apr 2022 and has tags Technical Leadership: Systems: Other, Managing A Team: Other, Managing A Team: Documentation/Writing

Making operational work more visible - Lorin Hochstein

In the f-string failure article in software development, I pointed out that log and error handling code was under-reviewed and tested. There’s probably a bigger lesson one can take from that on the undervaluing of supporting or glue or infrastructure work compared to “core” work.

And sure enough, one of the huge downsides of operations work is that when everything goes well, it’s invisible.

Above, Granda walks us through writing up an incident report and sharing it so that users (in our case, researchers) know what’s going on, and so that we build a pool of knowledge internally with the team. Hochstein suggests doing the same, for the same reasons, with incidents that don’t happen, and with routine debugging and improving of performance behind the scene:

Here’s the standing agenda for each issue: Brief recap, How did you figure out what the problem was, How did you resolve it? Anything notable/challenging? (e.g., diagnosing, resolving)

Getting in the habit of writing up and sharing — internally and externally — all the routine but notable work of Operations teams has the same benefits of writing up big incidents. It improves transparency, is generally pretty interesting to a small subset of users, it builds a knowledge base, and it builds trust.