#58 - 22 January 2021

Hi, all!

This is the last part of the stop doing things challenge - the change management of actually stopping doing something. Or you can skip straight to the roundup.

The hardest thing about stopping doing things, of course, isn’t the identifying what things to stop doing, but actually stopping and staying stopped doing them.

There’s really only three major steps to managing any substantial change, whether to something you’re doing or your team is doing - communicating the change, implementing the change, and monitoring the change. For stopping doing something, implementing the change is mostly trivial - just don’t do the things! The other two steps are things that you have to continue doing, until well past the time you feel confident that the change has “taken”.

So for stopping something, the steps will look like:

Communicate individually
Finalize the plan, along with how you will monitor the change
Communicate in a group/publicly
Monitor the change
Repeat communications/monitoring well past the time you think is necessary.

The first phase is to speak individually with each of the people who will be affected by the change - your boss, stakeholders who were making use of what you were doing, and if applicable, team members who were part of doing that thing. The framing is that you have a plan for the coming year, and you’re getting input on the plan. Walk them through a pretty thorough proposal of the change - why you’re doing it, how it will happen, and alternatives for them.

If your research software team is pivoting to more data-science and data-engineering activities, you might explain that this is where the institution needs most support right now, that it aligns with the research strategic plan, and that you’re happy to continue to bolster and support training so that their trainees can take on some of the basic software development your team used to do, or to recommend nearby teams for those tasks.
If your systems team is decommissioning a less-used compute platform with no plans for a replacement, communicate how this will allow the team to focus more on the high-needs area and provide better support there, and suggest alternatives or ways to port the workflow.
If there’s an information-sharing meeting you used to run and coordinate that you’re going to stop doing, offer to help someone else run it, or to set up some other information sharing forum like an email list.

Throughout this conversation, genuinely seek input and feedback. These conversations are a pre-emptive way of saying no to stakeholders, an opportunity to align you and them to the same sets of goals. The input doesn’t necessarily have to be positive, and that’s ok. You’ll find stakeholder problems you hadn’t anticipated, which you can then make plans to mitigate; you’ll find things you thought would be problems that in fact aren’t, so you can make your plan a little simpler. You might also find some people who are just dead-set against what you’re proposing, which you’ll likely have to discuss with your boss.

Having these individual conversations serves three extremely important purposes:

It communicates your intent,
It makes people feel that you’re soliciting their input and take that input seriously, which is vital for trust-building; and
It collects important feedback to improve your plan.

You can greatly improve the effectiveness of these conversations by sending the person you spoke with an email fairly promptly after each conversation, with your notes on what you heard from them in your own words, what steps you’re planning to take based on their input, and asking them to let you know if you’ve missed or misstated anything, or if they have anything to add. This significantly strengthens all of the intent, trust-building and feedback-collection purposes of the conversation.

Depending on how serious some of the objections are, you might have to make nontrivial changes to your plan, in which case a second round of individual communications might be useful to keep them up to date. That also has the huge advantage of showing that you take the feedback you are gathering seriously.

(I’m embarrassed to say that I used to completely skip this step and leapt straight into detailed planning and public announcement of the proposal, which is a completely unnecessarily and unnervingly efficient trust-shredding exercise, to say nothing of being, frankly, arrogant. Pre-wiring the group communication, is an old idea that I first heard from Manager-Tools, and although it sounds a bit obvious in retrospect, it was a revelation.)

Once you’ve gathered the input and updated your plan, the next step is to choose a simple way to monitor that you are in fact stopping doing the thing. The biggest thing you have to worry about is backsliding. People who used to do certain tasks will tend to revert back to what they know, even if they know intellectually your team isn’t doing that any more. “Just this last time”, “I wasn’t doing anything this afternoon anyway”, etc. That goes for you as well as your team members!

A simple way to keep on track is to have some kind of monitoring system, which both communicates that the change is something that matters and sets up a process whereby backslides are visible and so people (maybe you!) are accountable. Metrics don’t have to be hard or technically complicated; you can start by just, even manually, counting the things you care about. You can count the number of data science projects the team is currently working on, and the number of “off-topic” projects, and publish them - watching the numbers track up and down, respectively. You can show the number of jobs on the system to be decommissioned, and watch it trend downwards as decommissioning nears and alternatives are found.

The data can be manually collected, and they can be communicated manually - at your weekly team meetings say; the point isn’t to have a flashy dashboard, the point is to have a simple, visible indicator which is going the right or wrong way.

Once those things are lined up, public announcements and tracking can start. You’ll have to keep communicating, and you’ll have to keep tracking, but with the foundations of individual communication and simple indicators in place, the likelihood of the change taking root and persisting are much greater.

And that’s it! Now off to the roundup.

Managing Teams

Your Star Employee Just Quit. Will Others Follow - Art Markman, HBR

Maintaining a strong team isn’t an activity that ever stops. We need to actively, constantly, be building the team - by supporting team members development and career goals, by giving them new challenges, and by bringing in new team members or developing and keeping an eye on a “bench” of possible candidates.

It’s not necessarily indicative of a problem by itself that the member is leaving - it’s good and healthy for people to move on into other roles. We should be actively helping people prepare to take on more responsibilities, including new roles!

But when a key team member leaves, it can have cascading consequences. Workload goes up, a trusted peer is gone, and remaining team members now feel uncertain about a workplace that previously felt pretty solid.

Markman recommends doing a thoughtful exit interview with the outgoing team member. It’s a valuable opportunity to get perspective of what you and the team can do better, from someone who can afford an extra bit of candour as they leave. It’s also important to assess if they were “shields down” to other opportunities for reasons that we could have influenced.

Then he recommends doubling down on things we should be doing at some level all the time anyway:

One-on-ones with the other team members, to take their pulse, seen how they’re doing;
Focus on the future and the goals of the team, make sure everyone’s aligned on what’s next - make the future seem less uncertain; and
Provide career development opportunities for the remaining team members, not just for them to fill in for the departing team member but for them to grow their own capabilities as individuals.

Key people leaving is one of many bad things that it’s best to have some kind of contingency plans for. I’m thinking of building a list of scenarios like this and emailing/tweeting them out weekly as a desktop planning exercise - helping research computing managers ensure they have plans and playbooks laid out for various contingencies. Catastrophe-as-a-Service sort of thing. Is that something you’d be intersted in?

How to Manage Multiple Projects at the Same Time - Elizabeth Harrin, Girl’s Guide to PM

Many of us in research computing are managing multiple projects/efforts simultaneously; Harrin’s post reminds us we’re not alone, and that most (59%) of project managers lead 2-5 projects. Typically our projects are part of a program - a portfolio of projects that are complementary to each other - which makes some of the people things easier, in that the teams overlap and see common goals.

The four big areas Harrin sees to keep an eye on are:

Managing your own tasks/priorities across projects
Managing team member’s efforts across the projects - getting people from project A to chip in on project B
Managing the individual projects - having regularly (Harrin has weekly) status update meetings for each project; many of us also have team-wide meetings where we talk about the state of all efforts.
Managing expectations - although this is the same whether you’re managing 1 or 10 projects, there’s just more expectations to manage with 10.

Harrin doesn’t offer any silver bullets for these areas, because there aren’t any, and what works best for you will depend on your preferences and situation.

For managing your own tasks, I keep a weekly priority list by my desk of things to keep an eye on, with categories for “issues” (efforts I have to actively manage) and “monitor” (efforts that are going well and I want to make sure they don’t become issues) to keep myself honest and not neglect the tending to the “monitor” areas.

Similarly, moving people’s efforts across projects is tricky, for us and for them, and again there’s no real silver bullet here, but it helps that our work usually complements each other across different efforts.

Job Interviews Don’t Work - Farnam Street Blog

Traditional interviews are pretty crummy for identifying good candidates for jobs - we miss too many potentially good matches and let too many potentially poor matches too far along the pipeline, wasting their time and ours. The blog post points out in some detail some of why that is:

Discrimination and Bias
The map is not the territory - how someone performs in the interview isn’t necessarily a reflection of how they’ll perform in the job
Gut feelings aren’t accurate (see the first point)
Experience in the job isn’t the same as expertise in interviewing for the job (see the second point)

We actually know a lot about how to make interviews better; the article suggests:

Structured interviews - Uniform questions
Blind auditions - Separating evaluation of information about/output from the candidate from evaluation of the candidate
Competency-related evaluations - which requires careful thought about what competencies you’re hiring for (is it really important that they have intermediate knowledge in programming language X, or is it more important that they are capable programmers in something, learn quickly, and collaborate well?)

Product Management and Working with Research Communities

5 Tips for Saying No To Stakeholders - Roman Pichler
User power, not power users: htop and its design philosophy - Hisham H. Muhammad

You can’t keep focus on your goals and priorities without saying no to requests. We’ve covered articles on this in the roundup before, and alluded to this in the stopping things advice at the beggining of the newsletter, but it’s an important topic! Pichler emphasizes:

Don’t feel bad about saying No
Empathize with the stakeholder
Reframe the conversation - around the problem to solve and the project goals
Don’t rush the decision (but don’t procrastinate saying No, either)
Try to find common ground but don’t split the difference

All of these things are important in both individual and group communication of changes, whether preemptive (“We won’t be doing this any more”) or reactive (“No, I’m afraid we can’t do that.”)

A nice example of this in a software development context is Muhammad’s article on saying no to a feature request for htop (which is a lovely package). htop had a very specific vision, and user requests contrary to that helps clarify the vision if it’s taken as an opportunity to say no, rather than muddying and compromising the vision through indiscriminate yeses. The story in the request is a pretty good illustration of all of five of Pichler’s points.

Illuminated equations - Matt Hall

We often have to communicate pretty complicated material to an audience who may be extremely capable and knowledgable but not necessarily in exactly what we’re talking about. Here Hall gives several examples of explanations of equations - from heavily annotated equations, to dynamic explanations, to multi-media explanations.

Research Software Development

Minigraph as a multi-assembly SV caller - Heng Li

One of the reasons why technological readiness, or maturity, is a much better way to think about most research software development than technical debt or even sustainability is that most research software development is in the early, discovery, stages of readiness:

Proof of concept: will this even work?
Prototype: works on my data, but finicky
Friendly users: works on my data; starting to work on their data
Public v1: more robust, starting to get first strangers using it
Infrastructure: reliable, depended on, routine community contributions

Most research software development projects never really gets past early stages, because in research software development, like any other aspect of research, most attempts aren’t as fruitful as hoped. And there’s nothing wrong with that; that’s just how discovery works. This article by Heng Li about the evolution in use cases for minigraph, a bioinformatics tool he wrote for mapping sequencing reads to graph genomes, is a good case in point. It was pitched as a fast caller for structural variant calling, but:

My exploration took a turn when one anonymous reviewer asked me to check the LPA gene. It was not in the graph because the gene was collapsed or missed in all input assemblies. Fortunately, I had several phased hifiasm assemblies at hand. LPA is there and minigraph generates a complex subgraph (figure below) far beyond the capability of VCF. Then I realized what minigraph is truly good for: complex SVs.

Now, I’m not sure the author would agree with me on this next point - he’s a very careful software developer who only releases solid, performant, code. But in general it’s difficult to make sound decisions about optimization, what is and isn’t technical debt, or what’s necessary to turn the prototype into a sustainable, maintainable body of code, when even the code’s use cases aren’t clear yet. And for all software, but especially research software, users will find use cases you haven’t dreamed of.

Between each of those rungs on the ladder of research software technology readiness are more user input leading to more potential pivots and more data about what is needed and how to weigh tradeoffs. Sustainability and technical debt are only considerations at the very top rungs of the ladder.

New codebase, who dis? (How to Join a Team and Learn a Codebase) - Samuel Taylor

Whether we as managers or team leads move to a new team, or are welcoming someone new to your project, it’s good to have a plan for how to get familiar with the new code. First is understanding the big picture:

Set up the development environment
Get some overview of the architecture
Understand the business (in our case, the research problems the code is meant to solve)

And then once that high-level picture is understood, get into the code and start doing something:

Locate the portion of code most relevant to the immediate task at hand.
Understand that code enough to form a hypothesis about the change you need to make.
Make that change and test your hypothesis. Sometimes the best way will be to click around in the UI or run a particular script. Sometimes the easiest path is to write a test that describes the behavior you’re after.
If your hypothesis was incorrect, return to step 2. Understand why that change didn’t do what you thought it would, and develop a new hypothesis.
Once you have working code, improve its quality. Write a test (or a few) that document the changes in behavior you made. Refactor your code for clarity and style.

I continue to think that for those of us who came up through the research side of research computing, our research experience gives us something of an unfair advantage of being thrown into situations where we have to develop an understanding of a new body of knowledge. Absorbing lots of new information, forming hypotheses, and testing them plays pretty strongly to our training.

Doubling down on open, Part II - Shay Banon, Elastic

A lot of infrastructure code like databases have a business model where the core product is FOSS, and they make money by selling licences to a premium version and hosted services. This has worked pretty well up until recently, but now companies like AWS and others are also running and hosting these services for a fee, while making changes that aren’t contributed back. The reason is that running the software as a service isn’t “distributing” the software, so changes can be made and kept proprietary while still profiting from the software development and cutting into the money it takes to provide the original software development.

As a result, late last year it was announced that Elasticsearch and Kibana are moving away from Apache 2.0 to dual licensing under their own Elastic License and MongoDB’s Server Side Public License (SSPL). This gives users most of the same rights, but adding the restriction that if you run it as a service you also must release code changes back to the community.

(Elastic is getting a lot of heat for this, with claims that it’s “not an open source licence”. A lot of that is unjustified - this isn’t Common Clause, which had bigger issues.)

For some - not all, but some - research software it may make sense to have running the hosted software, fee-for-service; researchers could pay to have the software run rather than pay (directly or indirectly) for the hardware and run it themselves. For such tools it may make sense to consider these modified licenses to ensure that contributions go back to the research community that could benefit from them.

Profiling GPU code with NSIGHT systems - Jorge L. Galvez Vallejo

NVidia is in the process of replacing the previous nvprof tools with a unified set of Nsight tools - Nsight systems, Nsight compute, and Nsight graphics. This article by Galvez Vallejo shows how to use Nsight systems (nsys) to show the high-level overview, and Nsight compute (ncu) to dig into the performance of a particular kernel.

Research Computing Systems

Our Dumb Security Questionnaire - Jacob Kaplan-Moss

A “10 questions, no diagrams” questionnaire, intended for small early stage companies looking at vendors, but really a pretty good short set of questions to consider about any security practice.

Announcing the Security Chaos Engineering Report - Aaron Rinehart

Introducing the new area of Security Chaos Engineering - introducing faults and failure modes to test your system’s or application’s security. If your security depends on having no failures, then you have a security problem.

Emerging Data & Infrastructure Tools

Shifting Cloud Security Left — Scanning Infrastructure as Code for Security Issues - Christophe Tafani-Dereeper

Tafani-Dereeper evaluates several tools for automated security scanning of Terraform deployments. The tools are set up to look for typical misconfigurations for AWS, GCP, and Azure, but most support custom checks that could be used for, e.g., local OpenStack deployments.

The article evaluates checkov, regula, terraform-compliance, terrascan, and tfsec, on metrics of project activity, support for custom checks, and usability. Also shown is setting up a static analysis workflow for terraform deployments. Tafani-Dereeper recommends checkov or tfsec for getting something working fast, while correlating multiple resources is bet supported by terrascan or regula, while terraform-compliance has a nice behaviour-driven development language the the author particular likes for adding complex custom checks.

Searching for RH Counterexamples — Deploying with Docker - Jeremy Kun

This is the fifth of an interesting series of tutorial articles, using the search for counterexamples of the Riemann Hypothesis as a fun but likely not very fruitful use case, touching on test-driven python development, using a database, adjusting search strategies, and unbounded integers in databases. This article goes through the steps of dockerizing the database and search program, and deploying it, which is a good starting point if you want to get started with such things.

Random

CrossHair is a python test suite based on contracts - like Ada! This is closer to property-based testing than usual unit testing, but not quite the same. It also has really interesting functionality like diffbehaviour.

For those who, like me, think IDEs more or less peaked with Turbo Pascal/Turbo C++, Turbo Vision is a text-based user interface library with a very different API than e.g. ncurses, and works on Windows as well as POSIX-y operating systems.

Interesting to see the growing HPC-as-a-Service options out there as HPE signs up with a Swedish facility for hosted HPC (a new Greenlake region?)

An interesting look at how TACC works and how the pandemic affected their 170-staff-strong centre.

A nice walkthrough on setting up a Jekyll website from scratch with simple.css and using Netlify to host.

Airtable is a nice easy databse-y backend for side projects; here’s a developer’s guide to developing using the airtable API.

A more functional line editor for bash - ble.sh

Powershell on Linux. I know this is heresy, but: future linux shells could learn meaningful things from powershell.

A data-aware shell that turns complex piped command lines into DAG workflows, and then migrates individual processes to be closer to the data(!!).

RCT Newsletter