Limits...
Developing eThread pipeline using SAGA-pilot abstraction for large-scale structural bioinformatics.

Ragothaman A, Boddu SC, Kim N, Feinstein W, Brylinski M, Jha S, Kim J - Biomed Res Int (2014)

Bottom Line: To leverage its utility, we have developed a pipeline for eThread--a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively.Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution.The developed pipeline is easily extensible to other types of distributed cyberinfrastructure.

View Article: PubMed Central - PubMed

Affiliation: RADICAL, ECE, Rutgers University, New Brunswick, NJ 08901, USA.

ABSTRACT
While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread--a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure.

Show MeSH
Proposed algorithm combining task-level parallelism and dynamic scheduling for eThread on EC2.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4066679&req=5

alg3: Proposed algorithm combining task-level parallelism and dynamic scheduling for eThread on EC2.

Mentions: Ideally, the best strategy is to implement dynamic scheduling, illustrated in Algorithm 3, that exploits task-level parallelism and data parallelization effectively by dynamically identifying the best resource mapping for upcoming tasks and data transfer. When such an algorithm for dynamic resource mapping exists, SAGA-Pilot can implement it in a straightforward fashion into the pipeline.


Developing eThread pipeline using SAGA-pilot abstraction for large-scale structural bioinformatics.

Ragothaman A, Boddu SC, Kim N, Feinstein W, Brylinski M, Jha S, Kim J - Biomed Res Int (2014)

Proposed algorithm combining task-level parallelism and dynamic scheduling for eThread on EC2.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4066679&req=5

alg3: Proposed algorithm combining task-level parallelism and dynamic scheduling for eThread on EC2.
Mentions: Ideally, the best strategy is to implement dynamic scheduling, illustrated in Algorithm 3, that exploits task-level parallelism and data parallelization effectively by dynamically identifying the best resource mapping for upcoming tasks and data transfer. When such an algorithm for dynamic resource mapping exists, SAGA-Pilot can implement it in a straightforward fashion into the pipeline.

Bottom Line: To leverage its utility, we have developed a pipeline for eThread--a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively.Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution.The developed pipeline is easily extensible to other types of distributed cyberinfrastructure.

View Article: PubMed Central - PubMed

Affiliation: RADICAL, ECE, Rutgers University, New Brunswick, NJ 08901, USA.

ABSTRACT
While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread--a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure.

Show MeSH