Limits...
SimulFold: simultaneously inferring RNA structures including pseudoknots, alignments, and trees using a Bayesian MCMC framework.

Meyer IM, Miklós I - PLoS Comput. Biol. (2007)

Bottom Line: We introduce a novel theoretical framework for co-estimating an RNA secondary structure including pseudoknots, a multiple sequence alignment, and an evolutionary tree, given several RNA input sequences.We use the new framework to predict RNA structures, and comprehensively evaluate the quality of our predictions by comparing our results to those of several other programs.It is thus likely to have a strong impact, both on the field of RNA structure prediction and on a wide range of data analyses.

View Article: PubMed Central - PubMed

Affiliation: UBC Bioinformatics Centre, University of British Columbia, Vancouver, British Columbia, Canada. irmtraud.meyer@cantab.net

ABSTRACT
Computational methods for predicting evolutionarily conserved rather than thermodynamic RNA structures have recently attracted increased interest. These methods are indispensable not only for elucidating the regulatory roles of known RNA transcripts, but also for predicting RNA genes. It has been notoriously difficult to devise them to make the best use of the available data and to predict high-quality RNA structures that may also contain pseudoknots. We introduce a novel theoretical framework for co-estimating an RNA secondary structure including pseudoknots, a multiple sequence alignment, and an evolutionary tree, given several RNA input sequences. We also present an implementation of the framework in a new computer program, called SimulFold, which employs a Bayesian Markov chain Monte Carlo method to sample from the joint posterior distribution of RNA structures, alignments, and trees. We use the new framework to predict RNA structures, and comprehensively evaluate the quality of our predictions by comparing our results to those of several other programs. We also present preliminary data that show SimulFold's potential as an alignment and phylogeny prediction method. SimulFold overcomes many conceptual limitations that current RNA structure prediction methods face, introduces several new theoretical techniques, and generates high-quality predictions of conserved RNA structures that may include pseudoknots. It is thus likely to have a strong impact, both on the field of RNA structure prediction and on a wide range of data analyses.

Show MeSH

Related in: MedlinePlus

MPD Alignment for the HDV Dataset Consisting of 15 Sequences of HDV RibozymesThe name of each sequence indicates the NCBI accession number of the strain. The ribozyme contains one pseudoknot and a variable helix as shown in the line above the alignment, which denotes the known reference structure in dot-bracket, or Vienna, notation. The posterior probabilities for each alignment column were derived from the multiple sequence alignments that the MCMC method sampled and are indicated at the top of the figure.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1941756&req=5

pcbi-0030149-g006: MPD Alignment for the HDV Dataset Consisting of 15 Sequences of HDV RibozymesThe name of each sequence indicates the NCBI accession number of the strain. The ribozyme contains one pseudoknot and a variable helix as shown in the line above the alignment, which denotes the known reference structure in dot-bracket, or Vienna, notation. The posterior probabilities for each alignment column were derived from the multiple sequence alignments that the MCMC method sampled and are indicated at the top of the figure.

Mentions: The MPD alignment is shown in Figure 6, together with the estimated posterior probabilities for each column as well as the reference secondary structure that includes a pseudoknot. The figure clearly highlights three regions in the alignment where lower posterior probabilities are due to an ambiguity in the estimation. The first region overlaps a hairpin loop. Even though the MPD alignment contains no gaps in this region, the sequences vary a lot and there exist several plausible explanations that relate the sequences in this region in terms of evolutionary indel events. The remaining two regions overlap the two base-paired sides of a variable helix. The low posterior probabilities indicate that several plausible alignments exist for these regions. These observations are in line with our difficulty to correctly predict these parts of this helix. We conjecture that this helix may be shorter or contain bulges in some of the sequences of the HDV set.


SimulFold: simultaneously inferring RNA structures including pseudoknots, alignments, and trees using a Bayesian MCMC framework.

Meyer IM, Miklós I - PLoS Comput. Biol. (2007)

MPD Alignment for the HDV Dataset Consisting of 15 Sequences of HDV RibozymesThe name of each sequence indicates the NCBI accession number of the strain. The ribozyme contains one pseudoknot and a variable helix as shown in the line above the alignment, which denotes the known reference structure in dot-bracket, or Vienna, notation. The posterior probabilities for each alignment column were derived from the multiple sequence alignments that the MCMC method sampled and are indicated at the top of the figure.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1941756&req=5

pcbi-0030149-g006: MPD Alignment for the HDV Dataset Consisting of 15 Sequences of HDV RibozymesThe name of each sequence indicates the NCBI accession number of the strain. The ribozyme contains one pseudoknot and a variable helix as shown in the line above the alignment, which denotes the known reference structure in dot-bracket, or Vienna, notation. The posterior probabilities for each alignment column were derived from the multiple sequence alignments that the MCMC method sampled and are indicated at the top of the figure.
Mentions: The MPD alignment is shown in Figure 6, together with the estimated posterior probabilities for each column as well as the reference secondary structure that includes a pseudoknot. The figure clearly highlights three regions in the alignment where lower posterior probabilities are due to an ambiguity in the estimation. The first region overlaps a hairpin loop. Even though the MPD alignment contains no gaps in this region, the sequences vary a lot and there exist several plausible explanations that relate the sequences in this region in terms of evolutionary indel events. The remaining two regions overlap the two base-paired sides of a variable helix. The low posterior probabilities indicate that several plausible alignments exist for these regions. These observations are in line with our difficulty to correctly predict these parts of this helix. We conjecture that this helix may be shorter or contain bulges in some of the sequences of the HDV set.

Bottom Line: We introduce a novel theoretical framework for co-estimating an RNA secondary structure including pseudoknots, a multiple sequence alignment, and an evolutionary tree, given several RNA input sequences.We use the new framework to predict RNA structures, and comprehensively evaluate the quality of our predictions by comparing our results to those of several other programs.It is thus likely to have a strong impact, both on the field of RNA structure prediction and on a wide range of data analyses.

View Article: PubMed Central - PubMed

Affiliation: UBC Bioinformatics Centre, University of British Columbia, Vancouver, British Columbia, Canada. irmtraud.meyer@cantab.net

ABSTRACT
Computational methods for predicting evolutionarily conserved rather than thermodynamic RNA structures have recently attracted increased interest. These methods are indispensable not only for elucidating the regulatory roles of known RNA transcripts, but also for predicting RNA genes. It has been notoriously difficult to devise them to make the best use of the available data and to predict high-quality RNA structures that may also contain pseudoknots. We introduce a novel theoretical framework for co-estimating an RNA secondary structure including pseudoknots, a multiple sequence alignment, and an evolutionary tree, given several RNA input sequences. We also present an implementation of the framework in a new computer program, called SimulFold, which employs a Bayesian Markov chain Monte Carlo method to sample from the joint posterior distribution of RNA structures, alignments, and trees. We use the new framework to predict RNA structures, and comprehensively evaluate the quality of our predictions by comparing our results to those of several other programs. We also present preliminary data that show SimulFold's potential as an alignment and phylogeny prediction method. SimulFold overcomes many conceptual limitations that current RNA structure prediction methods face, introduces several new theoretical techniques, and generates high-quality predictions of conserved RNA structures that may include pseudoknots. It is thus likely to have a strong impact, both on the field of RNA structure prediction and on a wide range of data analyses.

Show MeSH
Related in: MedlinePlus