Limits...
CloudDOE: a user-friendly tool for deploying Hadoop clouds and analyzing high-throughput sequencing data with MapReduce.

Chung WC, Chen CC, Ho JM, Lin CY, Hsu WL, Wang YC, Lee DT, Lai F, Huang CW, Chang YJ - PLoS ONE (2014)

Bottom Line: Explosive growth of next-generation sequencing data has resulted in ultra-large-scale data sets and ensuing computational problems.Using a MapReduce framework, data and workload can be distributed via a network to computers in the cloud to substantially reduce computational latency.Its smart wizards substantially reduce the complexity and costs of deployment, execution, enhancement, and management.

View Article: PubMed Central - PubMed

Affiliation: Institute of Information Science, Academia Sinica, Taipei, Taiwan; Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan; Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan.

ABSTRACT

Background: Explosive growth of next-generation sequencing data has resulted in ultra-large-scale data sets and ensuing computational problems. Cloud computing provides an on-demand and scalable environment for large-scale data analysis. Using a MapReduce framework, data and workload can be distributed via a network to computers in the cloud to substantially reduce computational latency. Hadoop/MapReduce has been successfully adopted in bioinformatics for genome assembly, mapping reads to genomes, and finding single nucleotide polymorphisms. Major cloud providers offer Hadoop cloud services to their users. However, it remains technically challenging to deploy a Hadoop cloud for those who prefer to run MapReduce programs in a cluster without built-in Hadoop/MapReduce.

Results: We present CloudDOE, a platform-independent software package implemented in Java. CloudDOE encapsulates technical details behind a user-friendly graphical interface, thus liberating scientists from having to perform complicated operational procedures. Users are guided through the user interface to deploy a Hadoop cloud within in-house computing environments and to run applications specifically targeted for bioinformatics, including CloudBurst, CloudBrush, and CloudRS. One may also use CloudDOE on top of a public cloud. CloudDOE consists of three wizards, i.e., Deploy, Operate, and Extend wizards. Deploy wizard is designed to aid the system administrator to deploy a Hadoop cloud. It installs Java runtime environment version 1.6 and Hadoop version 0.20.203, and initiates the service automatically. Operate wizard allows the user to run a MapReduce application on the dashboard list. To extend the dashboard list, the administrator may install a new MapReduce application using Extend wizard.

Conclusions: CloudDOE is a user-friendly tool for deploying a Hadoop cloud. Its smart wizards substantially reduce the complexity and costs of deployment, execution, enhancement, and management. Interested users may collaborate to improve the source code of CloudDOE to further incorporate more MapReduce bioinformatics tools into CloudDOE and support next-generation big data open source tools, e.g., Hadoop BigTop and Spark.

Availability: CloudDOE is distributed under Apache License 2.0 and is freely available at http://clouddoe.iis.sinica.edu.tw/.

Show MeSH

Related in: MedlinePlus

A structured XML configuration file and the generated wizard.The configuration file contains a metadata section on general program information, a set of parameters and its default values that are necessary to execute the program, and sections on log files and result download methods. CloudDOE loads a configuration file and generates the specific wizard required.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4045712&req=5

pone-0098146-g004: A structured XML configuration file and the generated wizard.The configuration file contains a metadata section on general program information, a set of parameters and its default values that are necessary to execute the program, and sections on log files and result download methods. CloudDOE loads a configuration file and generates the specific wizard required.

Mentions: Several NGS data analysis tools have been implemented on the MapReduce framework. To overcome the hurdle of manipulating a MapReduce program with complicated command-line interfaces, we proposed a graphical wizard dubbed Operate. Users can manipulate a program with customized interfaces generated from necessary information in a configuration file, which is composed by the program’s author or an advanced user (Figure 4). An isolation method is also introduced to create a dedicated workspace for storing experimental data, i.e., programs, input files, and experimental results, of each execution. With Operate wizard, users can benefit from (1) a graphical interface for the MapReduce program, (2) a streamlined method for manipulating input/output data and setting up program parameters, and (3) a status tracker and progress monitor for execution.


CloudDOE: a user-friendly tool for deploying Hadoop clouds and analyzing high-throughput sequencing data with MapReduce.

Chung WC, Chen CC, Ho JM, Lin CY, Hsu WL, Wang YC, Lee DT, Lai F, Huang CW, Chang YJ - PLoS ONE (2014)

A structured XML configuration file and the generated wizard.The configuration file contains a metadata section on general program information, a set of parameters and its default values that are necessary to execute the program, and sections on log files and result download methods. CloudDOE loads a configuration file and generates the specific wizard required.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4045712&req=5

pone-0098146-g004: A structured XML configuration file and the generated wizard.The configuration file contains a metadata section on general program information, a set of parameters and its default values that are necessary to execute the program, and sections on log files and result download methods. CloudDOE loads a configuration file and generates the specific wizard required.
Mentions: Several NGS data analysis tools have been implemented on the MapReduce framework. To overcome the hurdle of manipulating a MapReduce program with complicated command-line interfaces, we proposed a graphical wizard dubbed Operate. Users can manipulate a program with customized interfaces generated from necessary information in a configuration file, which is composed by the program’s author or an advanced user (Figure 4). An isolation method is also introduced to create a dedicated workspace for storing experimental data, i.e., programs, input files, and experimental results, of each execution. With Operate wizard, users can benefit from (1) a graphical interface for the MapReduce program, (2) a streamlined method for manipulating input/output data and setting up program parameters, and (3) a status tracker and progress monitor for execution.

Bottom Line: Explosive growth of next-generation sequencing data has resulted in ultra-large-scale data sets and ensuing computational problems.Using a MapReduce framework, data and workload can be distributed via a network to computers in the cloud to substantially reduce computational latency.Its smart wizards substantially reduce the complexity and costs of deployment, execution, enhancement, and management.

View Article: PubMed Central - PubMed

Affiliation: Institute of Information Science, Academia Sinica, Taipei, Taiwan; Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan; Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan.

ABSTRACT

Background: Explosive growth of next-generation sequencing data has resulted in ultra-large-scale data sets and ensuing computational problems. Cloud computing provides an on-demand and scalable environment for large-scale data analysis. Using a MapReduce framework, data and workload can be distributed via a network to computers in the cloud to substantially reduce computational latency. Hadoop/MapReduce has been successfully adopted in bioinformatics for genome assembly, mapping reads to genomes, and finding single nucleotide polymorphisms. Major cloud providers offer Hadoop cloud services to their users. However, it remains technically challenging to deploy a Hadoop cloud for those who prefer to run MapReduce programs in a cluster without built-in Hadoop/MapReduce.

Results: We present CloudDOE, a platform-independent software package implemented in Java. CloudDOE encapsulates technical details behind a user-friendly graphical interface, thus liberating scientists from having to perform complicated operational procedures. Users are guided through the user interface to deploy a Hadoop cloud within in-house computing environments and to run applications specifically targeted for bioinformatics, including CloudBurst, CloudBrush, and CloudRS. One may also use CloudDOE on top of a public cloud. CloudDOE consists of three wizards, i.e., Deploy, Operate, and Extend wizards. Deploy wizard is designed to aid the system administrator to deploy a Hadoop cloud. It installs Java runtime environment version 1.6 and Hadoop version 0.20.203, and initiates the service automatically. Operate wizard allows the user to run a MapReduce application on the dashboard list. To extend the dashboard list, the administrator may install a new MapReduce application using Extend wizard.

Conclusions: CloudDOE is a user-friendly tool for deploying a Hadoop cloud. Its smart wizards substantially reduce the complexity and costs of deployment, execution, enhancement, and management. Interested users may collaborate to improve the source code of CloudDOE to further incorporate more MapReduce bioinformatics tools into CloudDOE and support next-generation big data open source tools, e.g., Hadoop BigTop and Spark.

Availability: CloudDOE is distributed under Apache License 2.0 and is freely available at http://clouddoe.iis.sinica.edu.tw/.

Show MeSH
Related in: MedlinePlus