Software Development for Engineering Research

Open Science, Software Citation, and Reproducibility Best Practices


Kyle Niemeyer. 19 May 2020

ME 599, Corvallis, OR

Sharing software (and data) openly has clear benefits to others and yourself

A paper that isn’t accompanied by the software or data produced is just advertising. (Claerbout & Karrenbach, 1992)

People find reproducible results more trustworthy…

…and cite you more! (Piwowar & Vision, 2013)

Reduce duplicated effort and increase impact.

Open-Source Software

We've talked about:

  • Version Control
  • Licensing
  • Documentation
  • Testing
  • Packaging & Distribution

Great! All done?

stop sign

For research, we need one more step: archival of software and/or data.

Consider: what if you cite this, then someone modifies or deletes it?

Archiving

Archiving process

Live Demo: Connect GitHub to Zenodo

Live demo meme

Modern science and engineering research depends on software.

2009 survey: 91% of scientists consider software “important” or “very important” to research. (Hannay et al, 2009)

But, 40–70% of software used is not cited. (Pan et al., 2015. Howison et al., 2016)

Citing software & data is important.

Our research results depend on software and data— different versions of software and data changes our answers.

Without proper citations, your work is not reproducible.

Also, academia relies on citations for credit. (for better or worse)

Software citation principles

Snapshot of software citation paper

Smith AM, Katz DS, Niemeyer KE, FORCE11 Software Citation Working Group. (2016) Software citation principles. PeerJ Computer Science 2:e86 https://doi.org/10.7717/peerj-cs.86

Software citation principles infographic

How to cite?

Name/description

Authors/developers

DOI or other unique/persistent identifier

Version number/commit hash

Location (e.g., GitHub repo)

(If there’s a paper describing it, cite that too)

Where to cite?

In the text with the references/bibliography.

JOSS: Journal of Open Source Software

  • https://joss.theoj.org
  • Developer-friendly journal for research software packages
  • Affiliate of Open Source Initiative
  • Open access, no fees
JOSS logo
“If you've already licensed your code and have good documentation then we expect that it should take less than an hour to prepare and submit your paper to JOSS.”
JOSS workflow

JOSS paper submission

JOSS paper submission

JOSS paper reviews

JOSS reviews

JOSS review

JOSS paper review

Reproducibility best practices: "Repro-packs"

Lorena Barba describes “reproducibility packages” associated with papers, sharing figures under CC-BY:

“For every figure that presents some result, we bundle the files needed to reproduce it — input or configuration files used to run the simulation(s) behind the result; code to process raw data into derived data; and scripts to create output graphs — and deposit them together with the figure into an open-data repository, such as Figshare. Figshare assigns the bundle a DOI, which we then include in the figure caption so readers can easily find the data and re-create the result. Our lab uses these packages as test beds for our in-house software, to verify that the results haven’t been compromised by software modifications. And because we maintain a public history of all changes, we achieve what one of my students calls ‘unimpeachable provenance’.”

My practice

  1. Produce a single “repro-pack” for an entire paper, which contains:
    • Python plotting scripts and associated results data
    • Figures (PDFs for plots, always)
    • Any other relevant data: input files, configuration files, etc.
  2. Upload to Zenodo under CC-BY license
  3. Cite using the resulting DOI in the associated paper(s)

How to cite/mention

citation of repropack reference for repropack

CP Stone, AT Alferman, & KE Niemeyer. 2018. “Accelerating finite-rate chemical kinetics with coprocessors: comparing vectorization methods on GPUs, MICs, and CPUs.” Computer Physics Communications, 226:18–29.

File layout of repropack

Example: pyJac and papers

Questions?