Limits...
SimulFold: simultaneously inferring RNA structures including pseudoknots, alignments, and trees using a Bayesian MCMC framework.

Meyer IM, Miklós I - PLoS Comput. Biol. (2007)

Bottom Line: We introduce a novel theoretical framework for co-estimating an RNA secondary structure including pseudoknots, a multiple sequence alignment, and an evolutionary tree, given several RNA input sequences.We use the new framework to predict RNA structures, and comprehensively evaluate the quality of our predictions by comparing our results to those of several other programs.It is thus likely to have a strong impact, both on the field of RNA structure prediction and on a wide range of data analyses.

View Article: PubMed Central - PubMed

Affiliation: UBC Bioinformatics Centre, University of British Columbia, Vancouver, British Columbia, Canada. irmtraud.meyer@cantab.net

ABSTRACT
Computational methods for predicting evolutionarily conserved rather than thermodynamic RNA structures have recently attracted increased interest. These methods are indispensable not only for elucidating the regulatory roles of known RNA transcripts, but also for predicting RNA genes. It has been notoriously difficult to devise them to make the best use of the available data and to predict high-quality RNA structures that may also contain pseudoknots. We introduce a novel theoretical framework for co-estimating an RNA secondary structure including pseudoknots, a multiple sequence alignment, and an evolutionary tree, given several RNA input sequences. We also present an implementation of the framework in a new computer program, called SimulFold, which employs a Bayesian Markov chain Monte Carlo method to sample from the joint posterior distribution of RNA structures, alignments, and trees. We use the new framework to predict RNA structures, and comprehensively evaluate the quality of our predictions by comparing our results to those of several other programs. We also present preliminary data that show SimulFold's potential as an alignment and phylogeny prediction method. SimulFold overcomes many conceptual limitations that current RNA structure prediction methods face, introduces several new theoretical techniques, and generates high-quality predictions of conserved RNA structures that may include pseudoknots. It is thus likely to have a strong impact, both on the field of RNA structure prediction and on a wide range of data analyses.

Show MeSH

Related in: MedlinePlus

Two Consensus Networks for the HDV Dataset Consisting of 15 Sequences of HDV RibozymesThe name of each sequence indicates the NCBI accession number of the strain (see also Figure 6). The edge lengths in the left network correspond to the probability of the corresponding split in the posterior distribution, whereas the edge lengths in the right network correspond to the average length of the edge in the sampled trees.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1941756&req=5

pcbi-0030149-g008: Two Consensus Networks for the HDV Dataset Consisting of 15 Sequences of HDV RibozymesThe name of each sequence indicates the NCBI accession number of the strain (see also Figure 6). The edge lengths in the left network correspond to the probability of the corresponding split in the posterior distribution, whereas the edge lengths in the right network correspond to the average length of the edge in the sampled trees.

Mentions: We calculated consensus networks based on the evolutionary trees sampled from the posterior distribution by the MCMC using the method of Holland and Moulton [82] implemented in the SplitsTree4 program [83]. We set the threshold for splits to 0.1, i.e., we retained only splits that were present in at least 10% of the sampled trees, and generated the two networks shown in Figure 8. The two networks have the same topology, but differ in the lengths of their edges, which represent different kinds of information. In the left network, the length of each edge is proportional to the probability of the split that is represented by the edge in the posterior distribution (the unit in the top left corner shows 1,000 occurrences in 2,000 sampled trees). In the right network, the length of each edge is equal to the average length of the edge in the sampled trees that contain that edge. There are five groups of strains: the lone strain AJ309873; a group containing U81988, M28267, X77627, and M92448; another group containing U81989, AF104263, AF104264, and X85253; and finally two relatively close groups containing AB088679, AF018077, and AF309420, and L22063, AB03748, and AJ309880. There is not enough phylogenetic signal to infer the relationship between the union of the two last groups and the other three groups. As Figure 8 indicates, there are several plausible explanations for how strains in the first two groups could have evolved.


SimulFold: simultaneously inferring RNA structures including pseudoknots, alignments, and trees using a Bayesian MCMC framework.

Meyer IM, Miklós I - PLoS Comput. Biol. (2007)

Two Consensus Networks for the HDV Dataset Consisting of 15 Sequences of HDV RibozymesThe name of each sequence indicates the NCBI accession number of the strain (see also Figure 6). The edge lengths in the left network correspond to the probability of the corresponding split in the posterior distribution, whereas the edge lengths in the right network correspond to the average length of the edge in the sampled trees.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1941756&req=5

pcbi-0030149-g008: Two Consensus Networks for the HDV Dataset Consisting of 15 Sequences of HDV RibozymesThe name of each sequence indicates the NCBI accession number of the strain (see also Figure 6). The edge lengths in the left network correspond to the probability of the corresponding split in the posterior distribution, whereas the edge lengths in the right network correspond to the average length of the edge in the sampled trees.
Mentions: We calculated consensus networks based on the evolutionary trees sampled from the posterior distribution by the MCMC using the method of Holland and Moulton [82] implemented in the SplitsTree4 program [83]. We set the threshold for splits to 0.1, i.e., we retained only splits that were present in at least 10% of the sampled trees, and generated the two networks shown in Figure 8. The two networks have the same topology, but differ in the lengths of their edges, which represent different kinds of information. In the left network, the length of each edge is proportional to the probability of the split that is represented by the edge in the posterior distribution (the unit in the top left corner shows 1,000 occurrences in 2,000 sampled trees). In the right network, the length of each edge is equal to the average length of the edge in the sampled trees that contain that edge. There are five groups of strains: the lone strain AJ309873; a group containing U81988, M28267, X77627, and M92448; another group containing U81989, AF104263, AF104264, and X85253; and finally two relatively close groups containing AB088679, AF018077, and AF309420, and L22063, AB03748, and AJ309880. There is not enough phylogenetic signal to infer the relationship between the union of the two last groups and the other three groups. As Figure 8 indicates, there are several plausible explanations for how strains in the first two groups could have evolved.

Bottom Line: We introduce a novel theoretical framework for co-estimating an RNA secondary structure including pseudoknots, a multiple sequence alignment, and an evolutionary tree, given several RNA input sequences.We use the new framework to predict RNA structures, and comprehensively evaluate the quality of our predictions by comparing our results to those of several other programs.It is thus likely to have a strong impact, both on the field of RNA structure prediction and on a wide range of data analyses.

View Article: PubMed Central - PubMed

Affiliation: UBC Bioinformatics Centre, University of British Columbia, Vancouver, British Columbia, Canada. irmtraud.meyer@cantab.net

ABSTRACT
Computational methods for predicting evolutionarily conserved rather than thermodynamic RNA structures have recently attracted increased interest. These methods are indispensable not only for elucidating the regulatory roles of known RNA transcripts, but also for predicting RNA genes. It has been notoriously difficult to devise them to make the best use of the available data and to predict high-quality RNA structures that may also contain pseudoknots. We introduce a novel theoretical framework for co-estimating an RNA secondary structure including pseudoknots, a multiple sequence alignment, and an evolutionary tree, given several RNA input sequences. We also present an implementation of the framework in a new computer program, called SimulFold, which employs a Bayesian Markov chain Monte Carlo method to sample from the joint posterior distribution of RNA structures, alignments, and trees. We use the new framework to predict RNA structures, and comprehensively evaluate the quality of our predictions by comparing our results to those of several other programs. We also present preliminary data that show SimulFold's potential as an alignment and phylogeny prediction method. SimulFold overcomes many conceptual limitations that current RNA structure prediction methods face, introduces several new theoretical techniques, and generates high-quality predictions of conserved RNA structures that may include pseudoknots. It is thus likely to have a strong impact, both on the field of RNA structure prediction and on a wide range of data analyses.

Show MeSH
Related in: MedlinePlus