Limits...
SimulFold: simultaneously inferring RNA structures including pseudoknots, alignments, and trees using a Bayesian MCMC framework.

Meyer IM, Miklós I - PLoS Comput. Biol. (2007)

Bottom Line: We introduce a novel theoretical framework for co-estimating an RNA secondary structure including pseudoknots, a multiple sequence alignment, and an evolutionary tree, given several RNA input sequences.We use the new framework to predict RNA structures, and comprehensively evaluate the quality of our predictions by comparing our results to those of several other programs.It is thus likely to have a strong impact, both on the field of RNA structure prediction and on a wide range of data analyses.

View Article: PubMed Central - PubMed

Affiliation: UBC Bioinformatics Centre, University of British Columbia, Vancouver, British Columbia, Canada. irmtraud.meyer@cantab.net

ABSTRACT
Computational methods for predicting evolutionarily conserved rather than thermodynamic RNA structures have recently attracted increased interest. These methods are indispensable not only for elucidating the regulatory roles of known RNA transcripts, but also for predicting RNA genes. It has been notoriously difficult to devise them to make the best use of the available data and to predict high-quality RNA structures that may also contain pseudoknots. We introduce a novel theoretical framework for co-estimating an RNA secondary structure including pseudoknots, a multiple sequence alignment, and an evolutionary tree, given several RNA input sequences. We also present an implementation of the framework in a new computer program, called SimulFold, which employs a Bayesian Markov chain Monte Carlo method to sample from the joint posterior distribution of RNA structures, alignments, and trees. We use the new framework to predict RNA structures, and comprehensively evaluate the quality of our predictions by comparing our results to those of several other programs. We also present preliminary data that show SimulFold's potential as an alignment and phylogeny prediction method. SimulFold overcomes many conceptual limitations that current RNA structure prediction methods face, introduces several new theoretical techniques, and generates high-quality predictions of conserved RNA structures that may include pseudoknots. It is thus likely to have a strong impact, both on the field of RNA structure prediction and on a wide range of data analyses.

Show MeSH

Related in: MedlinePlus

Alignment PriorFor calculating the gap contribution to the prior, F3(A), we decompose the alignment into homogeneous groups based only on the pattern of the gaps in the alignment. Each asterisk represents a nucleotide in the alignment, and each dash denotes a gap in the alignment.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1941756&req=5

pcbi-0030149-g001: Alignment PriorFor calculating the gap contribution to the prior, F3(A), we decompose the alignment into homogeneous groups based only on the pattern of the gaps in the alignment. Each asterisk represents a nucleotide in the alignment, and each dash denotes a gap in the alignment.

Mentions: The alignment, A, is a result of an evolutionary indel process along the tree, T. We define F3(A) to model the gap contribution to the likelihood. We are not aware of any stochastic evolutionary model for indels that can handle an additional constraint on RNA secondary structure and that allows computationally efficient likelihood computations. We therefore use prior probabilities on alignments to incorporate indel events into our model. We choose F3(A) as the exponentiated penalty scores of gaps in the alignment, A. We decompose the alignment into homogeneous groups as shown in Figure 1. This decomposition considers only the location of gaps in the alignment and does not take the RNA secondary structure, the tree, or the different types of nucleotides in the alignment explicitly into account. Log(F3(A)) is the sum of terms for each column in the alignment. The contribution by each column is the sum of one or more of the following terms (which are not mutually exclusive): gap opening penalty if at least one new gap is opened in the column, gap closing penalty if at least one gap is closed in the column, and gap penalty if there is at least one gap in the column. A gap opening gets a penalty of six, a gap closing gets a penalty of six, and a gap extension gets a penalty of three. Gap opening penalties are reduced by two if there are sequences where gaps have already been opened in other alignment columns. These penalty scores are similar to the standard gap penalties commonly used in alignment programs, e.g., Clustal-X [61]. Gap opening and closing penalties are omitted at the beginning and the end of the alignment.


SimulFold: simultaneously inferring RNA structures including pseudoknots, alignments, and trees using a Bayesian MCMC framework.

Meyer IM, Miklós I - PLoS Comput. Biol. (2007)

Alignment PriorFor calculating the gap contribution to the prior, F3(A), we decompose the alignment into homogeneous groups based only on the pattern of the gaps in the alignment. Each asterisk represents a nucleotide in the alignment, and each dash denotes a gap in the alignment.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1941756&req=5

pcbi-0030149-g001: Alignment PriorFor calculating the gap contribution to the prior, F3(A), we decompose the alignment into homogeneous groups based only on the pattern of the gaps in the alignment. Each asterisk represents a nucleotide in the alignment, and each dash denotes a gap in the alignment.
Mentions: The alignment, A, is a result of an evolutionary indel process along the tree, T. We define F3(A) to model the gap contribution to the likelihood. We are not aware of any stochastic evolutionary model for indels that can handle an additional constraint on RNA secondary structure and that allows computationally efficient likelihood computations. We therefore use prior probabilities on alignments to incorporate indel events into our model. We choose F3(A) as the exponentiated penalty scores of gaps in the alignment, A. We decompose the alignment into homogeneous groups as shown in Figure 1. This decomposition considers only the location of gaps in the alignment and does not take the RNA secondary structure, the tree, or the different types of nucleotides in the alignment explicitly into account. Log(F3(A)) is the sum of terms for each column in the alignment. The contribution by each column is the sum of one or more of the following terms (which are not mutually exclusive): gap opening penalty if at least one new gap is opened in the column, gap closing penalty if at least one gap is closed in the column, and gap penalty if there is at least one gap in the column. A gap opening gets a penalty of six, a gap closing gets a penalty of six, and a gap extension gets a penalty of three. Gap opening penalties are reduced by two if there are sequences where gaps have already been opened in other alignment columns. These penalty scores are similar to the standard gap penalties commonly used in alignment programs, e.g., Clustal-X [61]. Gap opening and closing penalties are omitted at the beginning and the end of the alignment.

Bottom Line: We introduce a novel theoretical framework for co-estimating an RNA secondary structure including pseudoknots, a multiple sequence alignment, and an evolutionary tree, given several RNA input sequences.We use the new framework to predict RNA structures, and comprehensively evaluate the quality of our predictions by comparing our results to those of several other programs.It is thus likely to have a strong impact, both on the field of RNA structure prediction and on a wide range of data analyses.

View Article: PubMed Central - PubMed

Affiliation: UBC Bioinformatics Centre, University of British Columbia, Vancouver, British Columbia, Canada. irmtraud.meyer@cantab.net

ABSTRACT
Computational methods for predicting evolutionarily conserved rather than thermodynamic RNA structures have recently attracted increased interest. These methods are indispensable not only for elucidating the regulatory roles of known RNA transcripts, but also for predicting RNA genes. It has been notoriously difficult to devise them to make the best use of the available data and to predict high-quality RNA structures that may also contain pseudoknots. We introduce a novel theoretical framework for co-estimating an RNA secondary structure including pseudoknots, a multiple sequence alignment, and an evolutionary tree, given several RNA input sequences. We also present an implementation of the framework in a new computer program, called SimulFold, which employs a Bayesian Markov chain Monte Carlo method to sample from the joint posterior distribution of RNA structures, alignments, and trees. We use the new framework to predict RNA structures, and comprehensively evaluate the quality of our predictions by comparing our results to those of several other programs. We also present preliminary data that show SimulFold's potential as an alignment and phylogeny prediction method. SimulFold overcomes many conceptual limitations that current RNA structure prediction methods face, introduces several new theoretical techniques, and generates high-quality predictions of conserved RNA structures that may include pseudoknots. It is thus likely to have a strong impact, both on the field of RNA structure prediction and on a wide range of data analyses.

Show MeSH
Related in: MedlinePlus