This resource first appeared in issue #113 on 12 Mar 2022 and has tags Technical Leadership: Systems: Other
Understanding wait time versus utilization - from reading Phoenix Project - Zhiqiang Qiao
Every so often I see technologists rediscover a very widely known result in operations research - introductory textbook stuff, really. Wait times (or other bad behaviour) start rocketing upwards once we get to high (somewhere between 80% - 90%) utilization. You see this in equipment, and teams, of course, too. Teams, whether they’re cash registers or software developers, start getting into trouble at sustained high “utilization rates”, e.g. overwork.
And yet, a typical metric for the systems we run for researchers includes utilization, with an understanding that higher is better. After all, if we have one system at 75% utilization and another at 90%, haven’t we wasted money on the 75% one by over-building?
Of course, Qiao points out that even if you discard utilization as a metric, wait time isn’t the only metric we might care about either. Getting the most important tasks through the queue, whether that’s software features or compute jobs, is what’s important.
Metrics matter! We bake in all kinds of pathological incentives when we choose KPIs based on what’s easily measured (typically technology or input based) instead of what actually matters (supporting new and high-impact research outputs).