jonathan@researchcomputingteams.org

Category: Managing A Team: Data Teams

Parent categories: Managing A Team

Ten simple rules for starting (and sustaining) an academic data science initiative - Micaela S. Parker, Arlyn E. Burgess, Philip E. Bourne, PLOS Computational Biology

Ten simple rules for starting (and sustaining) an academic data science initiative - Micaela S. Parker, Arlyn E. Burgess, Philip E. Bourne, PLOS Computational Biology Many research computing centres are trying to figure out how to launch or scale up a data science core facility or research institute. Creating anything new within an organization is a challenge, even when the winds are in your favour. Parker, Burgess, and Bourne offer some very sage advice on not just starting up a data science effort in particular,...

Continue...

Structuring Data Teams and Data Projects

Data project checklist How should I structure my data team? A look inside HubSpot, Away, M.M. LaFleur, and more A lot of organizations are setting up data science teams - these teams are analogous to other sorts of R&D computing teams, but these efforts are (a) everywhere and (b) involve both a lot more money on the line and a lot less institutional inertia than is the case when we’re setting up or managing an HPC centre. While the answers some organizations come up with...

Continue...

How to Grow Neat Software Architecture out of Jupyter Notebooks - Guillaume Chevalier

How to Grow Neat Software Architecture out of Jupyter Notebooks - Guillaume Chevalier This is an older blogpost which just became a recent talk. I’m coming around to the point of view that computational notebooks have real problems - obvious ones like hidden state, and maybe less obvious ones like the structure of notebooks actively discourage reasonable software development practices like unit testing or even version control. People even study this. But in research computing lots of things have problems and we are kind of...

Continue...

Bioinformatics challenges in multidisciplinary research - Mina Ali

Bioinformatics challenges in multidisciplinary research - Mina Ali The reason I prefer to talk about Research Computing as a whole rather than research software development/systems/curated databases/…, or breaking things out into bioinformatics/data science/simulations/… , is that the same issues come up over and over again. We’ve had articles in the roundup before about setting up a data science team in an organization and the challenges of having it be its own thing (and thus isolated) vs having team members scattered and individually embedded (and so...

Continue...

Collaborating on Research Data Support - Christina Maimone

Collaborating on Research Data Support - Christina Maimone This is a short and useful “what worked well/what was challenging” overview of three initiatives at Northwestern where Research Computing and the Libraries collaborated on research data support. Both entities have a lot of experience and a lot of resources around research data management, and have greater or lesser amounts of reach with different parts of the University community. Even though your research computing team and your library may be quite different, I think there’ll be a...

Continue...

Data Cleaning IS Analysis, Not Grunt Work - Randy Au, Counting Things

Data Cleaning IS Analysis, Not Grunt Work - Randy Au, Counting Things Au’s article can be summed up in one pull quote: The act of cleaning data is the act of preferentially transforming data so that your chosen analysis algorithm produces interpretable results. That is also the act of data analysis. Again, professionalism is doing things deliberately. I think we tend to get sloppy about things that are “just” data cleaning or “just” having decent uptime or “just” putting together a script - but these...

Continue...

Don’t Make Data Scientists Do Scrum - Sophia Yang, Towards Data Science

Don’t Make Data Scientists Do Scrum - Sophia Yang, Towards Data Science On the one hand, research computing and data projects, especially the intermediate parts between “will this even work” and “put this into production”, often map pretty well to agile approaches - you can’t waterfall your way to research and discovery. On the other hand, both the most uncertain (“Will this approach even work?”) and the most certain (“Let’s install this new cluster”) components are awkward fits to most agile frameworks, even if in...

Continue...