Sustained software development, not number of citations or journal choice, is indicative of accurate bioinformatic software - Paul P. Gardner *et al.*, Genome Biology

This resource first appeared in issue #111 on 26 Feb 2022 and has tags Strategy: Advocacy Resources

Sustained software development, not number of citations or journal choice, is indicative of accurate bioinformatic software - Paul P. Gardner et al., Genome Biology

The quote from the Results section sort of says it all:

We find that software speed, author reputation, journal impact, number of citations and age are unreliable predictors of software accuracy. This is unfortunate because these are frequently cited reasons for selecting software tools. However, GitHub-derived statistics and high version numbers show that accurate bioinformatic software tools are generally the product of many improvements over time.

I’m fond of saying that software isn’t “sustainable”; software is sustained, or it isn’t. This paper points out that software that is sustained tends to be more accurate than software that isn’t, even if that other software is highly cited.

Figure 1 from the paper.  Of particular interest is panel B, looking at correlation between various measures (H-index, M-index, Year, Speed, Citations, Contributors, Commits, PRs, Forks, number of issues, amongst others) and accuracy, with the red asterisks indicating mean values.  The only items significantly correlated with accuracy were the those associated with active development - contributors, commits, pull requests, forks, and number of issues, with %open issues negatively correlated with accuracy.

A second suggestion is that there may be a positive correlation between accuracy and speed of the software, probably because continued maintenance will tend to improve both:

We also find an excess of slow and inaccurate bioinformatic software tools, and this is consistent across many sub-disciplines. There are few tools that are middle-of-road in terms of accuracy and speed trade-offs.

<<<<<<< HEAD
======= >>>>>>> c1d069a... First pass at category pages