Limits...
Ensemble-based prediction of RNA secondary structures.

Aghaeepour N, Hoos HH - BMC Bioinformatics (2013)

Bottom Line: Despite the impressive progress that as been achieved in this area, existing evaluations of the prediction accuracy achieved by various algorithms do not provide a comprehensive, statistically sound assessment.In addition, AveRNA allows an intuitive and effective control of the trade-off between false negative and false positive base pair predictions.Finally, AveRNA can make use of arbitrary sets of secondary structure prediction procedures and can therefore be used to leverage improvements in prediction accuracy offered by algorithms and energy models developed in the future.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada. hoos@cs.ubc.ca

ABSTRACT

Background: Accurate structure prediction methods play an important role for the understanding of RNA function. Energy-based, pseudoknot-free secondary structure prediction is one of the most widely used and versatile approaches, and improved methods for this task have received much attention over the past five years. Despite the impressive progress that as been achieved in this area, existing evaluations of the prediction accuracy achieved by various algorithms do not provide a comprehensive, statistically sound assessment. Furthermore, while there is increasing evidence that no prediction algorithm consistently outperforms all others, no work has been done to exploit the complementary strengths of multiple approaches.

Results: In this work, we present two contributions to the area of RNA secondary structure prediction. Firstly, we use state-of-the-art, resampling-based statistical methods together with a previously published and increasingly widely used dataset of high-quality RNA structures to conduct a comprehensive evaluation of existing RNA secondary structure prediction procedures. The results from this evaluation clarify the performance relationship between ten well-known existing energy-based pseudoknot-free RNA secondary structure prediction methods and clearly demonstrate the progress that has been achieved in recent years. Secondly, we introduce AveRNA, a generic and powerful method for combining a set of existing secondary structure prediction procedures into an ensemble-based method that achieves significantly higher prediction accuracies than obtained from any of its component procedures.

Conclusions: Our new, ensemble-based method, AveRNA, improves the state of the art for energy-based, pseudoknot-free RNA secondary structure prediction by exploiting the complementary strengths of multiple existing prediction procedures, as demonstrated using a state-of-the-art statistical resampling approach. In addition, AveRNA allows an intuitive and effective control of the trade-off between false negative and false positive base pair predictions. Finally, AveRNA can make use of arbitrary sets of secondary structure prediction procedures and can therefore be used to leverage improvements in prediction accuracy offered by algorithms and energy models developed in the future. Our data, MATLAB software and a web-based version of AveRNA are publicly available at http://www.cs.ubc.ca/labs/beta/Software/AveRNA.

Show MeSH
Sensitivity versus PPV. Sensitivity vs positive predictive value (PPV) for different prediction algorithms; for AveRNA, the points along the curve were obtained by adjusting the pairing threshold θ, and for CONTRAfold 1.1, CONTRAfold 2.0, Centroidfold and MaxExpect by adjusting the parameter γ.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3750279&req=5

Figure 4: Sensitivity versus PPV. Sensitivity vs positive predictive value (PPV) for different prediction algorithms; for AveRNA, the points along the curve were obtained by adjusting the pairing threshold θ, and for CONTRAfold 1.1, CONTRAfold 2.0, Centroidfold and MaxExpect by adjusting the parameter γ.

Mentions: Figure4 illustrates the trade-off between sensitivity and PPV for all of these algorithms and shows clearly that overall, AveRNA dominates all previous methods, and in particular, gives much better results than the previous best algorithm that afforded control over this trade-off, CONTRAfold 2.0. We note that, in all cases, as a procedure becomes increasingly more conservative in predicting base pairs, eventually, both sensitivity and PPV drop (see Additional file1: Figure S1); we believe this to be a result of the high detrimental impact of even a small number of mispredicted base pairs when overall very few pairs are predicted.


Ensemble-based prediction of RNA secondary structures.

Aghaeepour N, Hoos HH - BMC Bioinformatics (2013)

Sensitivity versus PPV. Sensitivity vs positive predictive value (PPV) for different prediction algorithms; for AveRNA, the points along the curve were obtained by adjusting the pairing threshold θ, and for CONTRAfold 1.1, CONTRAfold 2.0, Centroidfold and MaxExpect by adjusting the parameter γ.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3750279&req=5

Figure 4: Sensitivity versus PPV. Sensitivity vs positive predictive value (PPV) for different prediction algorithms; for AveRNA, the points along the curve were obtained by adjusting the pairing threshold θ, and for CONTRAfold 1.1, CONTRAfold 2.0, Centroidfold and MaxExpect by adjusting the parameter γ.
Mentions: Figure4 illustrates the trade-off between sensitivity and PPV for all of these algorithms and shows clearly that overall, AveRNA dominates all previous methods, and in particular, gives much better results than the previous best algorithm that afforded control over this trade-off, CONTRAfold 2.0. We note that, in all cases, as a procedure becomes increasingly more conservative in predicting base pairs, eventually, both sensitivity and PPV drop (see Additional file1: Figure S1); we believe this to be a result of the high detrimental impact of even a small number of mispredicted base pairs when overall very few pairs are predicted.

Bottom Line: Despite the impressive progress that as been achieved in this area, existing evaluations of the prediction accuracy achieved by various algorithms do not provide a comprehensive, statistically sound assessment.In addition, AveRNA allows an intuitive and effective control of the trade-off between false negative and false positive base pair predictions.Finally, AveRNA can make use of arbitrary sets of secondary structure prediction procedures and can therefore be used to leverage improvements in prediction accuracy offered by algorithms and energy models developed in the future.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada. hoos@cs.ubc.ca

ABSTRACT

Background: Accurate structure prediction methods play an important role for the understanding of RNA function. Energy-based, pseudoknot-free secondary structure prediction is one of the most widely used and versatile approaches, and improved methods for this task have received much attention over the past five years. Despite the impressive progress that as been achieved in this area, existing evaluations of the prediction accuracy achieved by various algorithms do not provide a comprehensive, statistically sound assessment. Furthermore, while there is increasing evidence that no prediction algorithm consistently outperforms all others, no work has been done to exploit the complementary strengths of multiple approaches.

Results: In this work, we present two contributions to the area of RNA secondary structure prediction. Firstly, we use state-of-the-art, resampling-based statistical methods together with a previously published and increasingly widely used dataset of high-quality RNA structures to conduct a comprehensive evaluation of existing RNA secondary structure prediction procedures. The results from this evaluation clarify the performance relationship between ten well-known existing energy-based pseudoknot-free RNA secondary structure prediction methods and clearly demonstrate the progress that has been achieved in recent years. Secondly, we introduce AveRNA, a generic and powerful method for combining a set of existing secondary structure prediction procedures into an ensemble-based method that achieves significantly higher prediction accuracies than obtained from any of its component procedures.

Conclusions: Our new, ensemble-based method, AveRNA, improves the state of the art for energy-based, pseudoknot-free RNA secondary structure prediction by exploiting the complementary strengths of multiple existing prediction procedures, as demonstrated using a state-of-the-art statistical resampling approach. In addition, AveRNA allows an intuitive and effective control of the trade-off between false negative and false positive base pair predictions. Finally, AveRNA can make use of arbitrary sets of secondary structure prediction procedures and can therefore be used to leverage improvements in prediction accuracy offered by algorithms and energy models developed in the future. Our data, MATLAB software and a web-based version of AveRNA are publicly available at http://www.cs.ubc.ca/labs/beta/Software/AveRNA.

Show MeSH