Limits...
Tools and techniques for computational reproducibility.

Piccolo SR, Frampton MB - Gigascience (2016)

Bottom Line: When those steps have been described in sufficient detail that others can retrace the steps and obtain similar results, the research is said to be reproducible.However, in practice, computational findings often cannot be reproduced because of complexities in how software is packaged, installed, and executed-and because of limitations associated with how scientists document analysis steps.No single strategy is sufficient for every scenario; thus we emphasize that it is often useful to combine approaches.

View Article: PubMed Central - PubMed

Affiliation: Department of Biology, Brigham Young University, Provo, UT, 84602, USA. stephen_piccolo@byu.edu.

ABSTRACT
When reporting research findings, scientists document the steps they followed so that others can verify and build upon the research. When those steps have been described in sufficient detail that others can retrace the steps and obtain similar results, the research is said to be reproducible. Computers play a vital role in many research disciplines and present both opportunities and challenges for reproducibility. Computers can be programmed to execute analysis tasks, and those programs can be repeated and shared with others. The deterministic nature of most computer programs means that the same analysis tasks, applied to the same data, will often produce the same outputs. However, in practice, computational findings often cannot be reproduced because of complexities in how software is packaged, installed, and executed-and because of limitations associated with how scientists document analysis steps. Many tools and techniques are available to help overcome these challenges; here we describe seven such strategies. With a broad scientific audience in mind, we describe the strengths and limitations of each approach, as well as the circumstances under which each might be applied. No single strategy is sufficient for every scenario; thus we emphasize that it is often useful to combine approaches.

No MeSH data available.


Architecture of virtual machines. Virtual machines encapsulate analytical software and dependencies within a “guest” operating system, which may be different to the main (“host”) operating system. A virtual machine executes in the context of virtualization software, which runs alongside other software installed on the computer
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4940747&req=5

Fig5: Architecture of virtual machines. Virtual machines encapsulate analytical software and dependencies within a “guest” operating system, which may be different to the main (“host”) operating system. A virtual machine executes in the context of virtualization software, which runs alongside other software installed on the computer

Mentions: One solution is to use virtual machines, which can encapsulate an entire operating system and all software, scripts, code, and data necessary to execute a computational analysis [84, 85] (Fig. 5). Using virtualization software such as VirtualBox or VMWare (see Table 3), a virtual machine can be executed on practically any desktop, laptop, or server, irrespective of the main (“host”) operating system on the computer. For example, even though a scientist’s computer may be running a Windows operating system, they may perform an analysis on a Linux operating system that is running concurrently—within a virtual machine—on the same computer. The scientist has full control over the virtual (“guest”) operating system, and thus can install software and modify configuration settings as necessary. In addition, a virtual machine can be constrained to use specific amounts of computational resources (e.g., computer memory, processing power), thus enabling system administrators to ensure that multiple virtual machines can be executed simultaneously on the same computer without impacting each other’s performance. After executing an analysis, the scientist can export the entire virtual machine to a single, binary file. Other scientists can then use this file to reconstitute the same computational environment that was used for the original analysis. With a few exceptions (see Discussion), these scientists will obtain exactly the same results as the original scientist. This process provides the added benefits that 1) the scientist must only document the installation and configuration steps for a single operating system, 2) other scientists need only install the virtualization software and not individual software components, and 3) analyses can be re-executed indefinitely, so long as the virtualization software remains compatible with current computer systems [86]. The fact that a team of scientists can employ virtual machines to ensure that each team member has the same computational environment is also useful because team members may have different configurations on their host operating systems.Fig. 5


Tools and techniques for computational reproducibility.

Piccolo SR, Frampton MB - Gigascience (2016)

Architecture of virtual machines. Virtual machines encapsulate analytical software and dependencies within a “guest” operating system, which may be different to the main (“host”) operating system. A virtual machine executes in the context of virtualization software, which runs alongside other software installed on the computer
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4940747&req=5

Fig5: Architecture of virtual machines. Virtual machines encapsulate analytical software and dependencies within a “guest” operating system, which may be different to the main (“host”) operating system. A virtual machine executes in the context of virtualization software, which runs alongside other software installed on the computer
Mentions: One solution is to use virtual machines, which can encapsulate an entire operating system and all software, scripts, code, and data necessary to execute a computational analysis [84, 85] (Fig. 5). Using virtualization software such as VirtualBox or VMWare (see Table 3), a virtual machine can be executed on practically any desktop, laptop, or server, irrespective of the main (“host”) operating system on the computer. For example, even though a scientist’s computer may be running a Windows operating system, they may perform an analysis on a Linux operating system that is running concurrently—within a virtual machine—on the same computer. The scientist has full control over the virtual (“guest”) operating system, and thus can install software and modify configuration settings as necessary. In addition, a virtual machine can be constrained to use specific amounts of computational resources (e.g., computer memory, processing power), thus enabling system administrators to ensure that multiple virtual machines can be executed simultaneously on the same computer without impacting each other’s performance. After executing an analysis, the scientist can export the entire virtual machine to a single, binary file. Other scientists can then use this file to reconstitute the same computational environment that was used for the original analysis. With a few exceptions (see Discussion), these scientists will obtain exactly the same results as the original scientist. This process provides the added benefits that 1) the scientist must only document the installation and configuration steps for a single operating system, 2) other scientists need only install the virtualization software and not individual software components, and 3) analyses can be re-executed indefinitely, so long as the virtualization software remains compatible with current computer systems [86]. The fact that a team of scientists can employ virtual machines to ensure that each team member has the same computational environment is also useful because team members may have different configurations on their host operating systems.Fig. 5

Bottom Line: When those steps have been described in sufficient detail that others can retrace the steps and obtain similar results, the research is said to be reproducible.However, in practice, computational findings often cannot be reproduced because of complexities in how software is packaged, installed, and executed-and because of limitations associated with how scientists document analysis steps.No single strategy is sufficient for every scenario; thus we emphasize that it is often useful to combine approaches.

View Article: PubMed Central - PubMed

Affiliation: Department of Biology, Brigham Young University, Provo, UT, 84602, USA. stephen_piccolo@byu.edu.

ABSTRACT
When reporting research findings, scientists document the steps they followed so that others can verify and build upon the research. When those steps have been described in sufficient detail that others can retrace the steps and obtain similar results, the research is said to be reproducible. Computers play a vital role in many research disciplines and present both opportunities and challenges for reproducibility. Computers can be programmed to execute analysis tasks, and those programs can be repeated and shared with others. The deterministic nature of most computer programs means that the same analysis tasks, applied to the same data, will often produce the same outputs. However, in practice, computational findings often cannot be reproduced because of complexities in how software is packaged, installed, and executed-and because of limitations associated with how scientists document analysis steps. Many tools and techniques are available to help overcome these challenges; here we describe seven such strategies. With a broad scientific audience in mind, we describe the strengths and limitations of each approach, as well as the circumstances under which each might be applied. No single strategy is sufficient for every scenario; thus we emphasize that it is often useful to combine approaches.

No MeSH data available.