Limits...
Tools and techniques for computational reproducibility.

Piccolo SR, Frampton MB - Gigascience (2016)

Bottom Line: When those steps have been described in sufficient detail that others can retrace the steps and obtain similar results, the research is said to be reproducible.However, in practice, computational findings often cannot be reproduced because of complexities in how software is packaged, installed, and executed-and because of limitations associated with how scientists document analysis steps.No single strategy is sufficient for every scenario; thus we emphasize that it is often useful to combine approaches.

View Article: PubMed Central - PubMed

Affiliation: Department of Biology, Brigham Young University, Provo, UT, 84602, USA. stephen_piccolo@byu.edu.

ABSTRACT
When reporting research findings, scientists document the steps they followed so that others can verify and build upon the research. When those steps have been described in sufficient detail that others can retrace the steps and obtain similar results, the research is said to be reproducible. Computers play a vital role in many research disciplines and present both opportunities and challenges for reproducibility. Computers can be programmed to execute analysis tasks, and those programs can be repeated and shared with others. The deterministic nature of most computer programs means that the same analysis tasks, applied to the same data, will often produce the same outputs. However, in practice, computational findings often cannot be reproduced because of complexities in how software is packaged, installed, and executed-and because of limitations associated with how scientists document analysis steps. Many tools and techniques are available to help overcome these challenges; here we describe seven such strategies. With a broad scientific audience in mind, we describe the strengths and limitations of each approach, as well as the circumstances under which each might be applied. No single strategy is sufficient for every scenario; thus we emphasize that it is often useful to combine approaches.

No MeSH data available.


Example of a Docker container for genomics research. This container would enable researchers to preprocess various types of molecular data, using tools from Bioconductor and Galaxy, and to analyze the resulting data within a Jupyter notebook. Each box within the container represents a distinct Docker image. These images are layered such that some images depend on others (for example, the Bioconductor image depends on R). At its base, the container includes operating system libraries, which may not be present (or may be configured differently) on the computer’s main operating system
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4940747&req=5

Fig7: Example of a Docker container for genomics research. This container would enable researchers to preprocess various types of molecular data, using tools from Bioconductor and Galaxy, and to analyze the resulting data within a Jupyter notebook. Each box within the container represents a distinct Docker image. These images are layered such that some images depend on others (for example, the Bioconductor image depends on R). At its base, the container includes operating system libraries, which may not be present (or may be configured differently) on the computer’s main operating system

Mentions: We have described seven tools and techniques for facilitating computational reproducibility. None of these approaches is sufficient for every scenario in isolation; rather, scientists will often find value in combining approaches. For example, a researcher who uses a literate programming notebook (that combines narratives with code) might incorporate the notebook into a software container so that others can execute it without needing to install specific software dependencies. The container might also include a workflow management system to ease the process of integrating multiple tools and incorporating best practices for the analysis. This container could be packaged within a virtual machine or cloud-computing environment to ensure that it can be executed consistently (see Fig. 7). Binder [99] and Everware [100] are two services that allow researchers to execute Jupyter notebooks within a Web browser, using a Docker container to package the underlying software, and a cloud-computing environment to execute it. Although still under active development, such services may be harbingers of the future for computationally reproducible science.Fig. 7


Tools and techniques for computational reproducibility.

Piccolo SR, Frampton MB - Gigascience (2016)

Example of a Docker container for genomics research. This container would enable researchers to preprocess various types of molecular data, using tools from Bioconductor and Galaxy, and to analyze the resulting data within a Jupyter notebook. Each box within the container represents a distinct Docker image. These images are layered such that some images depend on others (for example, the Bioconductor image depends on R). At its base, the container includes operating system libraries, which may not be present (or may be configured differently) on the computer’s main operating system
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4940747&req=5

Fig7: Example of a Docker container for genomics research. This container would enable researchers to preprocess various types of molecular data, using tools from Bioconductor and Galaxy, and to analyze the resulting data within a Jupyter notebook. Each box within the container represents a distinct Docker image. These images are layered such that some images depend on others (for example, the Bioconductor image depends on R). At its base, the container includes operating system libraries, which may not be present (or may be configured differently) on the computer’s main operating system
Mentions: We have described seven tools and techniques for facilitating computational reproducibility. None of these approaches is sufficient for every scenario in isolation; rather, scientists will often find value in combining approaches. For example, a researcher who uses a literate programming notebook (that combines narratives with code) might incorporate the notebook into a software container so that others can execute it without needing to install specific software dependencies. The container might also include a workflow management system to ease the process of integrating multiple tools and incorporating best practices for the analysis. This container could be packaged within a virtual machine or cloud-computing environment to ensure that it can be executed consistently (see Fig. 7). Binder [99] and Everware [100] are two services that allow researchers to execute Jupyter notebooks within a Web browser, using a Docker container to package the underlying software, and a cloud-computing environment to execute it. Although still under active development, such services may be harbingers of the future for computationally reproducible science.Fig. 7

Bottom Line: When those steps have been described in sufficient detail that others can retrace the steps and obtain similar results, the research is said to be reproducible.However, in practice, computational findings often cannot be reproduced because of complexities in how software is packaged, installed, and executed-and because of limitations associated with how scientists document analysis steps.No single strategy is sufficient for every scenario; thus we emphasize that it is often useful to combine approaches.

View Article: PubMed Central - PubMed

Affiliation: Department of Biology, Brigham Young University, Provo, UT, 84602, USA. stephen_piccolo@byu.edu.

ABSTRACT
When reporting research findings, scientists document the steps they followed so that others can verify and build upon the research. When those steps have been described in sufficient detail that others can retrace the steps and obtain similar results, the research is said to be reproducible. Computers play a vital role in many research disciplines and present both opportunities and challenges for reproducibility. Computers can be programmed to execute analysis tasks, and those programs can be repeated and shared with others. The deterministic nature of most computer programs means that the same analysis tasks, applied to the same data, will often produce the same outputs. However, in practice, computational findings often cannot be reproduced because of complexities in how software is packaged, installed, and executed-and because of limitations associated with how scientists document analysis steps. Many tools and techniques are available to help overcome these challenges; here we describe seven such strategies. With a broad scientific audience in mind, we describe the strengths and limitations of each approach, as well as the circumstances under which each might be applied. No single strategy is sufficient for every scenario; thus we emphasize that it is often useful to combine approaches.

No MeSH data available.