Limits...
Tools and techniques for computational reproducibility.

Piccolo SR, Frampton MB - Gigascience (2016)

Bottom Line: When those steps have been described in sufficient detail that others can retrace the steps and obtain similar results, the research is said to be reproducible.However, in practice, computational findings often cannot be reproduced because of complexities in how software is packaged, installed, and executed-and because of limitations associated with how scientists document analysis steps.No single strategy is sufficient for every scenario; thus we emphasize that it is often useful to combine approaches.

View Article: PubMed Central - PubMed

Affiliation: Department of Biology, Brigham Young University, Provo, UT, 84602, USA. stephen_piccolo@byu.edu.

ABSTRACT
When reporting research findings, scientists document the steps they followed so that others can verify and build upon the research. When those steps have been described in sufficient detail that others can retrace the steps and obtain similar results, the research is said to be reproducible. Computers play a vital role in many research disciplines and present both opportunities and challenges for reproducibility. Computers can be programmed to execute analysis tasks, and those programs can be repeated and shared with others. The deterministic nature of most computer programs means that the same analysis tasks, applied to the same data, will often produce the same outputs. However, in practice, computational findings often cannot be reproduced because of complexities in how software is packaged, installed, and executed-and because of limitations associated with how scientists document analysis steps. Many tools and techniques are available to help overcome these challenges; here we describe seven such strategies. With a broad scientific audience in mind, we describe the strengths and limitations of each approach, as well as the circumstances under which each might be applied. No single strategy is sufficient for every scenario; thus we emphasize that it is often useful to combine approaches.

No MeSH data available.


Related in: MedlinePlus

Example of a command line script. This script can be used to align DNA sequence data to a reference genome. First, it downloads the software and data files necessary for the analysis. Then, it extracts (“unzips”) these files, and aligns the data to a reference genome for Ebola virus. Finally, it converts, sorts, and indexes the aligned data. See Additional file 1 for an executable version of this script
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4940747&req=5

Fig1: Example of a command line script. This script can be used to align DNA sequence data to a reference genome. First, it downloads the software and data files necessary for the analysis. Then, it extracts (“unzips”) these files, and aligns the data to a reference genome for Ebola virus. Finally, it converts, sorts, and indexes the aligned data. See Additional file 1 for an executable version of this script

Mentions: Scientific software can often be executed in an automated manner via text-based commands. Using such commands—via a command-line interface—scientists can indicate the software program(s) to be executed and which parameter(s) should be used. When multiple commands must be executed, they can be compiled into scripts specifying the order in which the commands should be executed (Fig. 1; Additional file 1). In many cases, scripts also include commands for installing and configuring software. Such scripts serve as valuable documentation not only for individuals who wish to re-execute the analysis, but also for the researcher who performed the original analysis [47]. In these cases, no amount of narrative is an adequate substitute for providing the actual commands that were used.Fig. 1


Tools and techniques for computational reproducibility.

Piccolo SR, Frampton MB - Gigascience (2016)

Example of a command line script. This script can be used to align DNA sequence data to a reference genome. First, it downloads the software and data files necessary for the analysis. Then, it extracts (“unzips”) these files, and aligns the data to a reference genome for Ebola virus. Finally, it converts, sorts, and indexes the aligned data. See Additional file 1 for an executable version of this script
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4940747&req=5

Fig1: Example of a command line script. This script can be used to align DNA sequence data to a reference genome. First, it downloads the software and data files necessary for the analysis. Then, it extracts (“unzips”) these files, and aligns the data to a reference genome for Ebola virus. Finally, it converts, sorts, and indexes the aligned data. See Additional file 1 for an executable version of this script
Mentions: Scientific software can often be executed in an automated manner via text-based commands. Using such commands—via a command-line interface—scientists can indicate the software program(s) to be executed and which parameter(s) should be used. When multiple commands must be executed, they can be compiled into scripts specifying the order in which the commands should be executed (Fig. 1; Additional file 1). In many cases, scripts also include commands for installing and configuring software. Such scripts serve as valuable documentation not only for individuals who wish to re-execute the analysis, but also for the researcher who performed the original analysis [47]. In these cases, no amount of narrative is an adequate substitute for providing the actual commands that were used.Fig. 1

Bottom Line: When those steps have been described in sufficient detail that others can retrace the steps and obtain similar results, the research is said to be reproducible.However, in practice, computational findings often cannot be reproduced because of complexities in how software is packaged, installed, and executed-and because of limitations associated with how scientists document analysis steps.No single strategy is sufficient for every scenario; thus we emphasize that it is often useful to combine approaches.

View Article: PubMed Central - PubMed

Affiliation: Department of Biology, Brigham Young University, Provo, UT, 84602, USA. stephen_piccolo@byu.edu.

ABSTRACT
When reporting research findings, scientists document the steps they followed so that others can verify and build upon the research. When those steps have been described in sufficient detail that others can retrace the steps and obtain similar results, the research is said to be reproducible. Computers play a vital role in many research disciplines and present both opportunities and challenges for reproducibility. Computers can be programmed to execute analysis tasks, and those programs can be repeated and shared with others. The deterministic nature of most computer programs means that the same analysis tasks, applied to the same data, will often produce the same outputs. However, in practice, computational findings often cannot be reproduced because of complexities in how software is packaged, installed, and executed-and because of limitations associated with how scientists document analysis steps. Many tools and techniques are available to help overcome these challenges; here we describe seven such strategies. With a broad scientific audience in mind, we describe the strengths and limitations of each approach, as well as the circumstances under which each might be applied. No single strategy is sufficient for every scenario; thus we emphasize that it is often useful to combine approaches.

No MeSH data available.


Related in: MedlinePlus