Limits...
Multithreaded comparative RNA secondary structure prediction using stochastic context-free grammars.

Sükösd Z, Knudsen B, Vaerum M, Kjems J, Andersen ES - BMC Bioinformatics (2011)

Bottom Line: We have also improved the user interface and portability: alongside standalone executable and Java source code of the program, PPfold is also available as a free plugin to the CLC Workbenches.We have evaluated the accuracy of PPfold using BRaliBase I tests, and demonstrated its practical use by predicting the secondary structure of an alignment of 24 complete HIV-1 genomes in 65 minutes on an 8-core machine and identifying several known structural elements in the prediction.Based on the pfold model, PPfold is capable of fast, high-quality predictions of large RNA secondary structures, such as the genomes of RNA viruses or long genomic transcripts.

View Article: PubMed Central - HTML - PubMed

Affiliation: Interdisciplinary Nanoscience Center, Aarhus University, Denmark. zs@mb.au.dk

ABSTRACT

Background: The prediction of the structure of large RNAs remains a particular challenge in bioinformatics, due to the computational complexity and low levels of accuracy of state-of-the-art algorithms. The pfold model couples a stochastic context-free grammar to phylogenetic analysis for a high accuracy in predictions, but the time complexity of the algorithm and underflow errors have prevented its use for long alignments. Here we present PPfold, a multithreaded version of pfold, which is capable of predicting the structure of large RNA alignments accurately on practical timescales.

Results: We have distributed both the phylogenetic calculations and the inside-outside algorithm in PPfold, resulting in a significant reduction of runtime on multicore machines. We have addressed the floating-point underflow problems of pfold by implementing an extended-exponent datatype, enabling PPfold to be used for large-scale RNA structure predictions. We have also improved the user interface and portability: alongside standalone executable and Java source code of the program, PPfold is also available as a free plugin to the CLC Workbenches. We have evaluated the accuracy of PPfold using BRaliBase I tests, and demonstrated its practical use by predicting the secondary structure of an alignment of 24 complete HIV-1 genomes in 65 minutes on an 8-core machine and identifying several known structural elements in the prediction.

Conclusions: PPfold is the first parallelized comparative RNA structure prediction algorithm to date. Based on the pfold model, PPfold is capable of fast, high-quality predictions of large RNA secondary structures, such as the genomes of RNA viruses or long genomic transcripts. The techniques used in the parallelization of this algorithm may be of general applicability to other bioinformatics algorithms.

Show MeSH

Related in: MedlinePlus

Dependencies of sectors in the inside-outside algorithm. The geometry for job divisions is inspired by the dependency structure of the (a) inside and (b) outside algorithms. For illustration purposes, here the number of divisions of the sequence is 11, giving rise to 66 sectors in total. (Sectors are numbered starting from 0, in accordance with the algorithm implementation.) In order to calculate values in the sector marked red, the indicated values in the coloured sectors must be known.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3102635&req=5

Figure 4: Dependencies of sectors in the inside-outside algorithm. The geometry for job divisions is inspired by the dependency structure of the (a) inside and (b) outside algorithms. For illustration purposes, here the number of divisions of the sequence is 11, giving rise to 66 sectors in total. (Sectors are numbered starting from 0, in accordance with the algorithm implementation.) In order to calculate values in the sector marked red, the indicated values in the coloured sectors must be known.

Mentions: We divide the triangle into equal-sized parallelogram-shaped "sectors" (Figure 3). We will refer to the number of sectors in the first row of the triangle by N. The dependency of the sectors on each other in the inside and outside parts of the algorithm is illustrated by Figure 4; the values for all nonterminals in each sector can be evaluated once all dependencies are completed. A "job" thus entails the evaluation of either the inside or the outside variables corresponding to a sector for all nonterminal variables in the grammar. The workload in jobs is not equally distributed, as illustrated by Figure 5.


Multithreaded comparative RNA secondary structure prediction using stochastic context-free grammars.

Sükösd Z, Knudsen B, Vaerum M, Kjems J, Andersen ES - BMC Bioinformatics (2011)

Dependencies of sectors in the inside-outside algorithm. The geometry for job divisions is inspired by the dependency structure of the (a) inside and (b) outside algorithms. For illustration purposes, here the number of divisions of the sequence is 11, giving rise to 66 sectors in total. (Sectors are numbered starting from 0, in accordance with the algorithm implementation.) In order to calculate values in the sector marked red, the indicated values in the coloured sectors must be known.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3102635&req=5

Figure 4: Dependencies of sectors in the inside-outside algorithm. The geometry for job divisions is inspired by the dependency structure of the (a) inside and (b) outside algorithms. For illustration purposes, here the number of divisions of the sequence is 11, giving rise to 66 sectors in total. (Sectors are numbered starting from 0, in accordance with the algorithm implementation.) In order to calculate values in the sector marked red, the indicated values in the coloured sectors must be known.
Mentions: We divide the triangle into equal-sized parallelogram-shaped "sectors" (Figure 3). We will refer to the number of sectors in the first row of the triangle by N. The dependency of the sectors on each other in the inside and outside parts of the algorithm is illustrated by Figure 4; the values for all nonterminals in each sector can be evaluated once all dependencies are completed. A "job" thus entails the evaluation of either the inside or the outside variables corresponding to a sector for all nonterminal variables in the grammar. The workload in jobs is not equally distributed, as illustrated by Figure 5.

Bottom Line: We have also improved the user interface and portability: alongside standalone executable and Java source code of the program, PPfold is also available as a free plugin to the CLC Workbenches.We have evaluated the accuracy of PPfold using BRaliBase I tests, and demonstrated its practical use by predicting the secondary structure of an alignment of 24 complete HIV-1 genomes in 65 minutes on an 8-core machine and identifying several known structural elements in the prediction.Based on the pfold model, PPfold is capable of fast, high-quality predictions of large RNA secondary structures, such as the genomes of RNA viruses or long genomic transcripts.

View Article: PubMed Central - HTML - PubMed

Affiliation: Interdisciplinary Nanoscience Center, Aarhus University, Denmark. zs@mb.au.dk

ABSTRACT

Background: The prediction of the structure of large RNAs remains a particular challenge in bioinformatics, due to the computational complexity and low levels of accuracy of state-of-the-art algorithms. The pfold model couples a stochastic context-free grammar to phylogenetic analysis for a high accuracy in predictions, but the time complexity of the algorithm and underflow errors have prevented its use for long alignments. Here we present PPfold, a multithreaded version of pfold, which is capable of predicting the structure of large RNA alignments accurately on practical timescales.

Results: We have distributed both the phylogenetic calculations and the inside-outside algorithm in PPfold, resulting in a significant reduction of runtime on multicore machines. We have addressed the floating-point underflow problems of pfold by implementing an extended-exponent datatype, enabling PPfold to be used for large-scale RNA structure predictions. We have also improved the user interface and portability: alongside standalone executable and Java source code of the program, PPfold is also available as a free plugin to the CLC Workbenches. We have evaluated the accuracy of PPfold using BRaliBase I tests, and demonstrated its practical use by predicting the secondary structure of an alignment of 24 complete HIV-1 genomes in 65 minutes on an 8-core machine and identifying several known structural elements in the prediction.

Conclusions: PPfold is the first parallelized comparative RNA structure prediction algorithm to date. Based on the pfold model, PPfold is capable of fast, high-quality predictions of large RNA secondary structures, such as the genomes of RNA viruses or long genomic transcripts. The techniques used in the parallelization of this algorithm may be of general applicability to other bioinformatics algorithms.

Show MeSH
Related in: MedlinePlus