#125 - 11 Jun 2022

Strategy, Strategic Plans, and You; When Feedback doesn't Land; Cash-strapped Tech Hiring; Workflow Hub; Engineering Effectiveness; WorkflowHub; @NotOnlyFlops' RISC-V Cluster

Hi, all!

I want to write a little bit more in the coming year about strategy for leaders of research computing and data teams: setting priorities; deciding what and what not to do; and deciding what success looks like. It’s an important topic - and yet as with so many things in our line of work, no one teaches us how to do it. It’s also much more ambiguous in our discipline than it might have been a researcher, where you can keep score pretty simply with papers and grants.

Those of us who have gone through University strategic planning processes come out even more confused, because these “Strategic Plan” documents aren’t a great introduction to strategy at the best of times, and they’re particularly odd ducks in the context of a University.

There was a great article by Alex Usher and team at Higher Education Strategy Associates last week. I want to use it to set the stage for what a strategy is, what a good overall framework does for teams or orgs, why Strategic Plans for universities are so constrained, and what that means for us. Usher has written quite a bit about strategic plans in the past, some of it quite cutting, and all of it insightful - I recommend the articles, and the blog in general, if this topic interests you.

(This is a long one, and won’t be of interest all readers - feel free to skip to the roundup.)

A strategy, ultimately, is a means to solving a problem. You have a problem: say how to reach sustainability, or how hire and keep good team members, or how to position your team against a number of other local and remote teams. A strategy is a choice, a decision, about how you will solve that problem, given the existing situation and the tools at your disposal. Remember the story of how Pauli dismissed a paper as “not even wrong”? A good strategy, like a good theory, must possibly be wrong. If it couldn’t be wrong, it isn’t a strategy. If someone else couldn’t defensibly make a different choice, it isn’t a strategy. If a choice doesn’t address a well-defined problem, it’s not a strategy. Aspirations to “excellence” or “high community engagement” and the like aren’t strategies. (The canonical book to cite here is Good Strategy, Bad Strategy. I’ll warn you though, especially in academia, getting into this is like learning to recognize bad kerning. The badly done stuff is everywhere, and seeing it constantly gets really annoying).

Strategic Plans are a poorly named genre of documents. (Naming things is hard). It isn’t a plan - it doesn’t (and shouldn’t) say “we’ll do X, then Y, then Z”. It also isn’t a strategy, though it should implicitly be the documented outcome of deciding on one or more strategies.

Let’s say you woke up one morning and found yourself in charge of a research organization that faced new opportunities and challenges. A good strategic plan is something you could use to guide you away from some otherwise plausible choices and towards others. It’s a document which gives a community clarity about how big picture decisions will be guided - about the principles around which those choices will be assessed. It gives stakeholders clarity about what the organization is for, what it will be doing even when challenges arise, and if it can help them meet their goals.

Any document providing such clarity to leadership and stakeholders about what the organization will do when faced with opportunities and challenges is a perfectly fine Strategic Plan. Sometimes smaller versions of the document for smaller organizations is called a Strategic Framework. That’s is a better name, because it is a framework - or at least guardrails - for making strategic decisions (for the leadership, but also for stakeholders).

Plan or Framework, there’s no fixed structure for such documents. The most common approach is to have a section describing what the organization is for (with enough clarity that it’s easy to understand what it’s not for), the broad classes of activities it will undertake (often with some prioritization), and how it will measure whether it is accomplishing its purpose. But the essence of the document is the shared understanding of what the organization is and how it will be guided, not the section headings. For a modest-sized team, a well-written Strategic Framework type document could be a page or less in length, and would likely be called something less pretentious, but it would serve as one just fine.

Bad documents, on the other hand, cite lofty goals, too big to be achievable and too fuzzily defined for anyone to know for sure if they had been achieved anyway. They list grab-bags of individually worthy but collectively incoherent activities. The measures for success are just counts of things they do and not measures of what they’re trying to achieve by doing those things. They leave stakeholders uncertain as to what’s in and out of scope if new opportunities arise, or what the true priorities are if activities have to be scaled back. They leave leadership with no concrete guideposts to inform challenging decisions, and stakeholders no hint as to how those decisions will be made.

The problem with Strategic Plans/Frameworks, and why I think there’s so many crummy ones, is that the good ones are unremarkable and boring. They are the documentation of clear decisions made about what the organization is for and how it will measure its success. Read after the fact, you’d be forgiven for mostly forgetting having ever laid eyes on it. “Ok, the Institute for Theoretical Applied Translational Studies is for X, Y, and Z, and for the next couple of years it’ll be doing this that and those, and there was some list of KPIs. I’d work there, or work with them, if I wanted to do such-and-such. If I needed so-and-so, I’d have to look elsewhere. Yada yada. I don’t see what the big deal is“.

But, that’s huge. Getting an organization, a team, and stakeholders aligned to the point that they can clearly lay out what the organization is for and how it will measure whether it’s doing a good job is an absolutely foundational success.

Our job, as leaders, is to reduce uncertainty (#34). We are life-sized Maxwell’s Daemons, manually reducing entropy within and at the boundaries of our organization, so our teams can help our stakeholders in the way only they can. In research, and in computing, the range of possible things we could do is almost infinite. Discussing the purpose of an organization or team with stakeholders, and building enough consensus to agree on a relatively quotidian document, while having that document as an artifact to continue basing decisions around for the next few years, is a real accomplishment.

And it’s doable! Like most of management, the successful outcome is boring and mostly invisible, but it’s doable. Most well-run medium-to-large nonprofits, and many large-ish multi-institute research centers, have exactly such boring, clear, and useful documents. Getting to the point of having that document is almost never boring, mind you. The discussions can be quite… vigorous. But it is absolutely doable. (Disclaimer: while I have succeeded doing this a couple times in the past, I have also notably and pretty comprehensively failed to do it once for a large national organization, too.)

University strategic plans aren’t the worst to be found out there, but they arent the best, either. One reason we find them awkward is that they aren’t written for us. They’re written for donors and (for public institutions) governments, for reasons the HESA article discusses, so they for dry if not cringy reading for intrnal technical folks. Another reason is that universities are very constrained as to the directions they can set. Universities can’t directly steer teaching or research activities, because of academic freedom. So these are hard documents to write for universities.

But! The exercise leading up to a strategic document is a way to get the whole University on the same page as to where things are now, and to choose two or three priority areas to push forward in the next years. This is no small thing.

Those new areas are where the University leadership, including your VPRs and CIOs and Deans, have committed to try to make advances, based on input from the community. (If it’s not clear to you what those new priorities are, you can read the current strat plan along with the previous couple to watch the evolution). Leadership will be pushing very hard on those priorities, because it takes constant effort over years to make any change in as large and fractious an organization as a university. Nudging things in new directions takes long term suasion and building of necessary supporting infrastructure.

And that’s where we come in - we are an essential part of that infrastructure. We’ve discussed already that research computing and data can help lead research by making existing things easier, or making new things possible [#104]. What we haven’t discussed, and the HESA article does, is that our teams and other kinds of centres are amongst the few levers that institutional leaders have within Universities. Building new capabilities - the HESA article talks about it in terms of faculty hires, but I’d add creating centres of all types, including our own - creates new possibilities, new paths that people wander down:

In fact, one of the very few ways in which institutions or faculties can in fact regain actorship over things like teaching and research is in the act of hiring. […] However, in the act of hiring, it is possible to shape entire institutional futures (since hires can stay at an institution for 30-40 years) by choosing to build strength in clusters of related academic topics (eg. Water, China studies, Poverty), either within or preferably across disciplines. Given this, it is a bit remarkable how little time is spent in universities thinking specifically about hiring as a strategic activity. It’s not as though we don’t know that it works: pretty much all of Caltech’s strength as an institution, for instance, can be traced to a very calculated set of about a dozen or so strategic hires between 1910 and 1930. We just don’t talk about it or engage in it very much or – and this is the key part – linking it to strategic goals very often, either at the institutional or (maybe more appropriately) the faculty level.

Especially when our teams are part of a pillar which will help the institution go where it needs to go, our teams work can make a push into a new research area more or less successful.

Leadership too often doesn’t have the bandwidth to be checking in with us and nudging us to institutional priorities, especially since they typically don’t know what’s technically possible on the ground. (I’ve talked before about why the relation between us and academic leadership is like a nonprofit leader and their board before - this is part of the reason why). But they do have priorities, and they are able and enthusiastic to talk about them at length given the opportunity. Teams that demonstrate themselves strategically important are resourced strategically. Teams help leadership grow capabilities important to the institution, in those priority areas, are more likely to get opportunities and resources when they arise.

Our research computing and data teams have a real role in helping move our institutions grow to meet new challenges. Developing an effective professional relationship with leadership is hard in our institutions, because they’re quite flat and leadership has a lot of balls in the air. But that’s what it takes to make sure the work we and our teams are doing has as much impact as possible.

Send me an email if you have questions about university or research group strategic plans or their processes, or any particularly good (or terrible!) stories to share. Also let me know if there are topics you want me to cover in coming months on issues of strategy for research computing and data groups. Just hit reply or email me at jonathan@researchcomputingteams.org.

Sorry, that was long - but it’s an important primer on strategic planning documents for people who have been discombobulated by what goes on in universities. We’ll talk more (and at less length) later - for now, the roundup! It’s a short one this week - there’s always a bit of a lull in relevant articles towards the start of summer.


Managing Teams

What To Do When Your Feedback Doesn’t Land - Lara Hogan

Feedback often doesn’t require much of a conversation. Done well, feedback is lots of tiny little nudges (mostly positive!). If any one of those nudges doesn’t really register with the team member, or is misunderstood, well, it’s not a big deal.

Sometimes though it does take some conversation. Maybe you just want to find out what happened in a situation you can make some adjustments elsewhere. Sometimes it’s because the feedback is more serious, or even just more complex. Hogan’s Feedback Equation is good for those; it ends with an open-ended question. (She has a bunch of good open-ended questions here).

Sometimes when these conversations happen and it’s more serious or more complex, the feedback doesn’t seem to land, or register, or… something. That might be because the team member doesn’t understand, or is getting defensive. If the conversation is awkward, it’s pretty natural for the manager to just want to wrap things up and get out of there, but we know that’s only putting off the problem.

Hogan has five steps for dealing with these situations:

  • Acknowledge the awkwardness
  • Create spaces and pauses for them to fill in
  • Ask them to reflect back what they heard
  • Ask open coaching questions
  • Set next steps

What Good, Cash-Strapped Hiring Looks Like - Cedric Chin
Let’s talk a bit about giving interviews - Randy Au

The HESA article above highlights the importance of hiring, a topic that comes up repeatedly here.

Our teams have some of the same problems with hiring technical staff as everyone else does, but a lot of the articles I see from tech aren’t super relevant - they assume a lot of money, or that you’re constantly hiring multiple people at a time. Our teams are cash constrained, and hiring is only occasional. That makes hiring the right people even more important, and it means we have to take advantage of other mechanisms (like working with co-op students or other interns) to keep our hiring processes sharp.

Chin’s article talks about cash-strapped hiring with a few case studies. (We’ll probably see more articles about cash-strapped hiring in the coming year, as the market hits a tech correction and startups that were used to being able to raise money left right and center suddenly have to tighten their belts). The principles Chin sees in common for successful attempts are are:

  • You have a differentiated set of hiring criteria, and you are able to evaluate it during the hiring process.
  • You have some method for onboarding the talent you find in this way.
  • Because of 1 and 2, you are able to focus your hiring efforts on specific pools of talent.
  • You are willing and able to iterate on 1 through 3 above.

(The points about actively recruiting candidates that are likely to match the criteria are very relevant, too. Always be on the lookout for people in the community who might be good team members if the timing worked out.)

Point two, successful onboarding, is vital. I’m increasingly certain that hiring and onboarding are the same thing. In fact, starting the hiring process from picturing the end of the onboarding process - say four months in, the person is now a successful part of the team doing the job - is the way to go. Then back out the onboarding process. Then start putting together a list of hiring criteria for the job ad.

And for hiring criteria - I can’t stress enough the importance of point 1, tuning the hiring criteria to the actual work and the team. I have seen too many research computing and data teams hire against a laundry list of specific technical knowledge that might well be handy (but can easily be learned) while not actually evaluating how well the candidate would likely actually do the real, day-to-day job.

Au talks about something sort of skimmed over in Chin’s article - the interview process. Actually evaluating the candidate against the hiring criteria.

  • Make sure you’ve got your objective hiring bars and grading rubrics articulated and agreed upon by everyone
  • Remember that interviews are high stress, high stakes environment
  • Take home work is probably biassed, as is whiteboard coading
  • Basing questions off the real job is often a good idea
  • But don’t ask people to do free work for you
  • Please test your questions out on someone first
  • Finally, you’re going to be bad the first dozen times. Practice.

Getting the hiring bars and rubrics agreed to across everyone who’s going to interview is a fantastic opportunity to make sure the whole team is on board with what the new candidate’s job will actually be, and to help them develop their interviewing skills.

Also, we’re in science - calibrate your devices! Test the questions you’re going to ask on people who you know are confident would be good team members. If they stumble to get the question answered adequately in the time allotted, then adjust accordingly.


Research Software Development

WorkflowHub - EOSC-Life and ELIXIR

It’s fascinating to watch the unit of scientific software grow from the individual code to the pipeline. It’s become necessary not just because data analysis pipelines become a bigger part of research computing and data; even simulation science workflows are growing more complex.

Europe’s WorkflowHub, is a home for workflows (in CWL, NextFlow, Snakemake, and other formats). It pointed out a tool I hadn’t heard of before, WfCommons, for analyzing or simulating the run of complex workflows, good for testing workflow runners and systems. Interesting stuff!


Rands Leadership Slack engineering-effectiveness: A curated summary of shared tips & resources - Curated by Jasmine Robinson

I’ve mentioned the Rands Leadership Slack here a few times, where we’re slowly growing a research-computing-and-data channel. That channel doesn’t have enough of a critical mass yet to keep conversations going naturally, but luckily there’s so much else going on in the slack that there’s always something to read or a conversation to be a part of.

One new Rands’er has gone through the past five years of conversation on the Engineering Effectiveness channel and distilled the past five years of discussions, advice, and suggested resources into best practices, tips, resources, FAQs, and more: it’s well worth a look if you’re wondering how to make technical teams more productive and effective.

Table of Contents for the Rands Leadership Slack #Engineering-Effectiveness channel distillation

(The Rands slack operates under Chatham House rules, so distilling discussion from there is explicitly allowed as long as it’s not attributed).


It continues to impress me how a new younger community is rebooting the Fortran ecosystem - here’s an example of dusting off quadpack from netlib, refactoring it in modern fortran, getting it up on GitHub with GitHub actions for CI/CD to “to restart Quadpack development where it left off 40 years ago”, and put it into package managers.

Getting the code into modern Fortran isn’t some stylistic or aesthetic thing, either - intelligent refactoring support and other tooling in IDEs can’t make hide nor hair of F77 constructs. How is an automated tool supposed to know what to do around a computed goto? By moving it into F2008 or even F90, the code becomes much easier and faster to improve further.

I’m not sure if this effort comes too late for Fortran or not, but certainly the demand for number crunching on large rectangular multi-dimensional arrays has never been larger…


Research Data Management and Analysis

Revisiting data query speed with DuckDB - Jacob Matsen, DataDuel

The influx of new database types is a huge boon to data analysis/data science workflows. If, as is often the case in our line of work, the workflow is more about analytics than it is row-by-row data mutations, columnar or OLAP databases are the way to go. Matsen here is very impressed by the 80x(!) speedup he got using duckdb (an embedded columnar database) on a CSV file off of disk vs. running the same query in Postgres with the data in a table. Any of a number of other columnar databases would have done as well (although apparently not SQL Server Columnar Tables) especially if the data were converted to something more machine friendly like Parquet files.


Research Computing Systems

AMD Technology Roadmap from AMD Financial Analyst Day 2022 - Cliff Robinson, Serve The Home
The Increasingly Graphic Nature of Intel Datacenter Compute - Timothy Prickett Morgan, The Next Platform

Both Intel and AMD announced what’s coming next on their technical roadmaps over the past couple of weeks. I’m not going to comment on any specifics, because we’ve decided this isn’t a speeds and feeds newsletter and because I have a conflict. But there’s two things we can safely take away:

  • CPUs are in fact going to continue getting weirder - a rapidly growing range xPUs tailored to specific kinds of workloads. Even traditionally staid all-the-worlds-an-x86 Intel is getting in on the game. Gone is the brief window when system design for academic research computing was “just buy a whack load of 2-socket servers and rack em”. This is exciting, but it’s going to be a lot of work, too. Systems and software teams are going to take a while to adjust. More automation, CI/CD, etc is going to be key.
  • AMD’s turnaround in the past decade, with Lisa Su at the helm, is just astounding. It’ll be fun to watch the roadmap unfold.

Emerging Technologies and Practices

Running cost-effective GROMACS simulations using Amazon EC2 Spot Instances with AWS ParallelCluster - Santhan Pamulapati & Sean Smith, AWS HPC Blog

A big issue for using cloud for typical HPC workloads is relative cost. The more we learn to take advantage of much cheaper (often 90%) preemptable spot instances for workflows that can make effective use of them, the cheaper those workflows become. What’s more, those same approaches can also help us make better use of on-premises clusters, where preemption is possible but not widely used.

Here the authors demonstrate ParallelCluster + slurm restarting preempted Gromacs jobs with no additional work required. For the (not-uncommon) case of running large numbers of relatively few-particle jobs, checkpointing frequently is pretty lightweight on a high-performance filesystem, so the additional cost for this is pretty modest.


Strong Showing for First experimental RISC-V Supercomputer - Nicole Hemsoth, The Next Platform

Here Hemsoth writes about the ISC student cluster competition team @NotOnlyFlops out of the BSC, and their success competing with a cluster built out of the same SiFive motherboards we talked about in #122. Putting something like this together under the time constraints of a competition, where you don’t necessarily know what you’ll be asked to run on it, is pretty impressive! These systems (and more importantly the software for them) is much more mature than I would have thought a couple of months ago.


Random

A bash parser for command-line arguments - bashjazz.

I’ve wanted something like this for years - why don’t we have something like this in IDEs? A prototype text editor which allows drawing and rendering of line diagrams in SVG along with text. It’s 2022, why do we still have to use ASCII art to draw our in-code diagrams?

Speaking of editors - Atom is being EOLed.

Web apps are cool, I guess, and so are CLI tools, but what about tools that run over DNS?

As time goes on, the implementations of “World for X” stretch further and further into the past - Wordle in Pascal for Multics (an OS from the late 60s).

Web browsers are extremely complicated pieces of software - here’s a book walking through building a simple one in 1,000 lines of Python.

For those working with trainees, new to git and GitHub, GitHub is putting together a series of interactive self-paced tutorials for beginners - GitHub Skills.

Deep Learning is taking over a lot of things, but not in-terminal games without complete knowledge - symbolic methods are still comfortably ahead in the 2021 NetHack challenge.

Automated parallelization of POSIX shell scripts with PaSh.

An old school BASIC interpreter + DOS environment, reimagined as web app - EndBASIC.

The Mamba (faster drop-in replacement for conda) project is pretty mature now - here’s a quick overview.


That’s it…

And that’s it for another week. Let me know what you thought, or if you have anything you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.

Have a great weekend, and good luck in the coming week with your research computing team,

Jonathan

About This Newsletter

Research computing - the intertwined streams of software development, systems, data management and analysis - is much more than technology. It’s teams, it’s communities, it’s product management - it’s people. It’s also one of the most important ways we can be supporting science, scholarship, and R&D today.

So research computing teams are too important to research to be managed poorly. But no one teaches us how to be effective managers and leaders in academia. We have an advantage, though - working in research collaborations have taught us the advanced management skills, but not the basics.

This newsletter focusses on providing new and experienced research computing and data managers the tools they need to be good managers without the stress, and to help their teams achieve great results and grow their careers.