Limits...
SimulFold: simultaneously inferring RNA structures including pseudoknots, alignments, and trees using a Bayesian MCMC framework.

Meyer IM, Miklós I - PLoS Comput. Biol. (2007)

Bottom Line: We introduce a novel theoretical framework for co-estimating an RNA secondary structure including pseudoknots, a multiple sequence alignment, and an evolutionary tree, given several RNA input sequences.We use the new framework to predict RNA structures, and comprehensively evaluate the quality of our predictions by comparing our results to those of several other programs.It is thus likely to have a strong impact, both on the field of RNA structure prediction and on a wide range of data analyses.

View Article: PubMed Central - PubMed

Affiliation: UBC Bioinformatics Centre, University of British Columbia, Vancouver, British Columbia, Canada. irmtraud.meyer@cantab.net

ABSTRACT
Computational methods for predicting evolutionarily conserved rather than thermodynamic RNA structures have recently attracted increased interest. These methods are indispensable not only for elucidating the regulatory roles of known RNA transcripts, but also for predicting RNA genes. It has been notoriously difficult to devise them to make the best use of the available data and to predict high-quality RNA structures that may also contain pseudoknots. We introduce a novel theoretical framework for co-estimating an RNA secondary structure including pseudoknots, a multiple sequence alignment, and an evolutionary tree, given several RNA input sequences. We also present an implementation of the framework in a new computer program, called SimulFold, which employs a Bayesian Markov chain Monte Carlo method to sample from the joint posterior distribution of RNA structures, alignments, and trees. We use the new framework to predict RNA structures, and comprehensively evaluate the quality of our predictions by comparing our results to those of several other programs. We also present preliminary data that show SimulFold's potential as an alignment and phylogeny prediction method. SimulFold overcomes many conceptual limitations that current RNA structure prediction methods face, introduces several new theoretical techniques, and generates high-quality predictions of conserved RNA structures that may include pseudoknots. It is thus likely to have a strong impact, both on the field of RNA structure prediction and on a wide range of data analyses.

Show MeSH

Related in: MedlinePlus

Loglikelihood as a Function of the Steps in the MCMC Chain for the HDV set with and without Parallel TemperingWith parallel tempering (grey) and without parallel tempering (black). Without parallel tempering, the MCMC chain gets stuck in local minima.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1941756&req=5

pcbi-0030149-g005: Loglikelihood as a Function of the Steps in the MCMC Chain for the HDV set with and without Parallel TemperingWith parallel tempering (grey) and without parallel tempering (black). Without parallel tempering, the MCMC chain gets stuck in local minima.

Mentions: SimulFold is the only program that simultaneously co-estimates alignments, structures, and trees. It clearly outperforms all other programs in terms of overall performance for eight out of 16 sets: U5 (low), group II intron (low and high), tRNA (low), rRNA (low and high), entero, and HDV. It also shows a competitive performance for the sets U5 (high) and tRNA (high). These sets cover a wide range of average pids, from 40% to 91%. The results for the two SSU sets show that SimulFold has problems analyzing these two sets, whose reference alignments span more than 1,500 nucleotides. However, the results for the RNase P8 set show that SimulFold can successfully predict structures with high sensitivity even for comparatively long sequences (the reference alignment of the RNase P8 set has a length of 472 nucleotides). The results for the RNase P8 and the HDV sets show the benefits of parallel tempering. When investigating the predictions for the HDV set, we concluded from the loglikelihood plot (see Figure 5) that the MCMC chain got stuck in local minima. We therefore implemented a more sophisticated version of SimulFold that employs the MCMC technique of parallel tempering [81] to address the problem. As the grey line in Figure 5 shows, parallel tempering solves the mixing problem for the HDV set and significantly improved the sensitivity, while at the same time reducing the number of incorrectly predicted base-pairs.


SimulFold: simultaneously inferring RNA structures including pseudoknots, alignments, and trees using a Bayesian MCMC framework.

Meyer IM, Miklós I - PLoS Comput. Biol. (2007)

Loglikelihood as a Function of the Steps in the MCMC Chain for the HDV set with and without Parallel TemperingWith parallel tempering (grey) and without parallel tempering (black). Without parallel tempering, the MCMC chain gets stuck in local minima.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1941756&req=5

pcbi-0030149-g005: Loglikelihood as a Function of the Steps in the MCMC Chain for the HDV set with and without Parallel TemperingWith parallel tempering (grey) and without parallel tempering (black). Without parallel tempering, the MCMC chain gets stuck in local minima.
Mentions: SimulFold is the only program that simultaneously co-estimates alignments, structures, and trees. It clearly outperforms all other programs in terms of overall performance for eight out of 16 sets: U5 (low), group II intron (low and high), tRNA (low), rRNA (low and high), entero, and HDV. It also shows a competitive performance for the sets U5 (high) and tRNA (high). These sets cover a wide range of average pids, from 40% to 91%. The results for the two SSU sets show that SimulFold has problems analyzing these two sets, whose reference alignments span more than 1,500 nucleotides. However, the results for the RNase P8 set show that SimulFold can successfully predict structures with high sensitivity even for comparatively long sequences (the reference alignment of the RNase P8 set has a length of 472 nucleotides). The results for the RNase P8 and the HDV sets show the benefits of parallel tempering. When investigating the predictions for the HDV set, we concluded from the loglikelihood plot (see Figure 5) that the MCMC chain got stuck in local minima. We therefore implemented a more sophisticated version of SimulFold that employs the MCMC technique of parallel tempering [81] to address the problem. As the grey line in Figure 5 shows, parallel tempering solves the mixing problem for the HDV set and significantly improved the sensitivity, while at the same time reducing the number of incorrectly predicted base-pairs.

Bottom Line: We introduce a novel theoretical framework for co-estimating an RNA secondary structure including pseudoknots, a multiple sequence alignment, and an evolutionary tree, given several RNA input sequences.We use the new framework to predict RNA structures, and comprehensively evaluate the quality of our predictions by comparing our results to those of several other programs.It is thus likely to have a strong impact, both on the field of RNA structure prediction and on a wide range of data analyses.

View Article: PubMed Central - PubMed

Affiliation: UBC Bioinformatics Centre, University of British Columbia, Vancouver, British Columbia, Canada. irmtraud.meyer@cantab.net

ABSTRACT
Computational methods for predicting evolutionarily conserved rather than thermodynamic RNA structures have recently attracted increased interest. These methods are indispensable not only for elucidating the regulatory roles of known RNA transcripts, but also for predicting RNA genes. It has been notoriously difficult to devise them to make the best use of the available data and to predict high-quality RNA structures that may also contain pseudoknots. We introduce a novel theoretical framework for co-estimating an RNA secondary structure including pseudoknots, a multiple sequence alignment, and an evolutionary tree, given several RNA input sequences. We also present an implementation of the framework in a new computer program, called SimulFold, which employs a Bayesian Markov chain Monte Carlo method to sample from the joint posterior distribution of RNA structures, alignments, and trees. We use the new framework to predict RNA structures, and comprehensively evaluate the quality of our predictions by comparing our results to those of several other programs. We also present preliminary data that show SimulFold's potential as an alignment and phylogeny prediction method. SimulFold overcomes many conceptual limitations that current RNA structure prediction methods face, introduces several new theoretical techniques, and generates high-quality predictions of conserved RNA structures that may include pseudoknots. It is thus likely to have a strong impact, both on the field of RNA structure prediction and on a wide range of data analyses.

Show MeSH
Related in: MedlinePlus