acm-header
Sign In

Communications of the ACM

ACM Careers

Why Researchers Should Share Computer Code


View as: Print Mobile App Share:
shared code, illustration

Credit: iStockPhoto.com

For years, scientists have discussed whether and how to share data from painstaking research and costly experiments. Some are further along in their efforts toward "open science" than others: Fields such as astronomy and oceanography, for example, involve such expensive and large-scale equipment and logistical challenges to data collection that collaboration among institutions has become the norm.

Meanwhile, a variety of academic journals, including several in the Nature Research family, are turning their attention to another aspect of the research process: computer programming code. Code is becoming increasingly important in research because scientists are often writing their own computer programs to interpret their data, rather than using commercial software packages. Some journals now include scientific data and code as part of the peer-review process.

And now, with the online publication of "Toward Standard Practices for Sharing Computer Code and Programs in Neuroscience" in the journal Nature Neuroscience, there are conventions and tools that researchers can use to make code sharing easier and more efficient. The paper, by Ben Marwick, University of Washington associate professor of anthropology, and 13 other colleagues at universities across the United States and Europe, advocates the sharing of code, while an editorial in the journal, "Extending Transparency to Code," announces a pilot project to ask future authors to make their code available for review.

Making the programs behind the research accessible allows other scientists to test the code and reproduce the computations in an experiment — in other words, to reproduce results and solidify findings. It's the "how the sausage is made" part of research, Marwick says. It also allows the code to be used by other researchers in new studies, making it easier for scientists to build on the work of their colleagues.

"What we're missing is the convention of sharing code or the tools for turning data into useful discoveries or information," Marwick says. "Researchers say it's great to have the data available in a paper — increasingly raw data are available in supplementary files or specialized online repositories — but the code for performing the clever analyses in between the raw data and the published figures and tables are still inaccessible."

Since 2014, Nature Research has encouraged writers to make their code available upon request.

The Nature Neuroscience pilot focuses on three elements: whether the code supporting an author's main claims is publicly accessible; whether the code functions without mistakes; and whether it produces the results cited.

"This is a commitment from a high-impact journal to raise software to the status of a regular research product, that it's not just a tool that gets discarded along the way, or hidden on a researcher's computer where no-one else can benefit from it," Marwick says. "In the future, scientific disciplines will be shifting to a position where you need to share your code as well as your data. It will be easier to reproduce someone's new discovery, and incorporate their discoveries into your own work."

Imagine this scenario, Marwick says: A neuroscientist is trying to find new ways to identify early-stage tumors using 3-D brain imagery. She comes up with an algorithm that can pick out specific pixel values in an image, which helps lead to early tumor detection. By sharing the computer code and its mathematical algorithm, the scientist could facilitate a breakthrough.

The Nature Neuroscience paper resulted from a two-day workshop held in 2014 in the United Kingdom, to which Marwick, an archaeologist, was invited because of his efforts in using code and promoting open science in archaeology. A Senior Data Science Fellow at the UW eScience Institute, Marwick is active in the institute's Reproducibility and Open Science Group, which works on issues and practices around tools and practices to enhance data sharing, preservation, and reproducibility.

Bill Howe, associate director of the eScience Institute, says code sharing is part of the future. "Reproducibility is literally the definition of science, and as science moves from the lab to the computer, code sharing must be at the core of how we conduct research and train students."

An open science approach to sharing code is not without its critics, as well as scientists who raise legal and ethical questions about the repercussions. How do researchers get proper credit for the code they share? How should code be cited in the scholarly literature? How will it count toward tenure and promotion applications? How is sharing code compatible with patents and commercialization of software technology?

Marwick, who specializes in prehistoric human evolutionary ecology in Southeast Asia and Australia, has been advocating for code-sharing and related open science initiatives in archaeology through the Society for American Archaeology.

"I'm just trying to shift the needle in my discipline to a practice that benefits everyone — researchers and the public," he says.

The Nature Neuroscience article is authored by Stephen J Eglen, Ben Marwick, Yaroslav O. Halchenko, Michael Hanke, Shoaib Sufi, Padraig Gleeson, R Angus Silver, Andrew P Davison, Linda Lanyon, Mathew Abrams, Thomas Wachtler, David J Willshaw, Christophe Pouzat, and Jean-Baptiste Poline.


 

No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account