Limits...
Can Clustal-style progressive pairwise alignment of multiple sequences be used in RNA secondary structure prediction?

Bellamy-Royds AB, Turcotte M - BMC Bioinformatics (2007)

Bottom Line: In ribonucleic acid (RNA) molecules whose function depends on their final, folded three-dimensional shape (such as those in ribosomes or spliceosome complexes), the secondary structure, defined by the set of internal basepair interactions, is more consistently conserved than the primary structure, defined by the sequence of nucleotides.We have found that applying a progressive, pairwise approach to the alignment of multiple ribonucleic acid sequences produces highly reliable predictions of conserved basepairs, and we have shown how these predictions can be used as constraints to improve the results of a single-sequence structure prediction algorithm.However, we have also discovered that the amount of detail included in a consensus structure prediction is highly dependent on the order in which sequences are added to the alignment (the guide tree), and that if a consensus structure does not have sufficient detail, it is less likely to provide useful constraints for the single-sequence method.

View Article: PubMed Central - HTML - PubMed

Affiliation: School of Information Technology and Engineering, University of Ottawa, Ottawa, Ontario, Canada. turcotte@site.uottawa.ca <turcotte@site.uottawa.ca>

ABSTRACT

Background: In ribonucleic acid (RNA) molecules whose function depends on their final, folded three-dimensional shape (such as those in ribosomes or spliceosome complexes), the secondary structure, defined by the set of internal basepair interactions, is more consistently conserved than the primary structure, defined by the sequence of nucleotides.

Results: The research presented here investigates the possibility of applying a progressive, pairwise approach to the alignment of multiple RNA sequences by simultaneously predicting an energy-optimized consensus secondary structure. We take an existing algorithm for finding the secondary structure common to two RNA sequences, Dynalign, and alter it to align profiles of multiple sequences. We then explore the relative successes of different approaches to designing the tree that will guide progressive alignments of sequence profiles to create a multiple alignment and prediction of conserved structure.

Conclusion: We have found that applying a progressive, pairwise approach to the alignment of multiple ribonucleic acid sequences produces highly reliable predictions of conserved basepairs, and we have shown how these predictions can be used as constraints to improve the results of a single-sequence structure prediction algorithm. However, we have also discovered that the amount of detail included in a consensus structure prediction is highly dependent on the order in which sequences are added to the alignment (the guide tree), and that if a consensus structure does not have sufficient detail, it is less likely to provide useful constraints for the single-sequence method.

Show MeSH

Related in: MedlinePlus

The reference secondary structure predicted for 5S rRNA sequence V00336 (a), the structure predicted by mfold as the unconstrained optimum (b), the conserved structure predicted by the nearest-neighbour consensus algorithm with gap penalty 4 or 6 kcal/mol (c), and the structure predicted by mfold as the optimum, when it was forced to include the consensus basepairs (d). Images produced by the sir graph utility of the mfold program [24].
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1904245&req=5

Figure 12: The reference secondary structure predicted for 5S rRNA sequence V00336 (a), the structure predicted by mfold as the unconstrained optimum (b), the conserved structure predicted by the nearest-neighbour consensus algorithm with gap penalty 4 or 6 kcal/mol (c), and the structure predicted by mfold as the optimum, when it was forced to include the consensus basepairs (d). Images produced by the sir graph utility of the mfold program [24].

Mentions: Interestingly, the sequence with the best prediction under the constrained re-folding was also the sequence with the worst prediction, unconstrained. Sequence V00336, from E. coli, was originally folded by mfold into a structure in which only the outermost helix matched the reference structure (0.26 MCC); a reported sub-optimal structure likewise only matched this one helix. However, with the 11 constrained base-pairs from the consensus structure, mfold was able to detect the complete structure: all canonical basepairs were predicted, with no incorrect predictions. The reference structure, the original mfold optimum, the consensus structure, and the constrained mfold optimum are shown in Figure 12. A similar, if not quite as spectacular, improvement occurred for the sequence X02627, which went from 0.32 MCC for the original mfold prediction, to 0.90 MCC for the refold.


Can Clustal-style progressive pairwise alignment of multiple sequences be used in RNA secondary structure prediction?

Bellamy-Royds AB, Turcotte M - BMC Bioinformatics (2007)

The reference secondary structure predicted for 5S rRNA sequence V00336 (a), the structure predicted by mfold as the unconstrained optimum (b), the conserved structure predicted by the nearest-neighbour consensus algorithm with gap penalty 4 or 6 kcal/mol (c), and the structure predicted by mfold as the optimum, when it was forced to include the consensus basepairs (d). Images produced by the sir graph utility of the mfold program [24].
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1904245&req=5

Figure 12: The reference secondary structure predicted for 5S rRNA sequence V00336 (a), the structure predicted by mfold as the unconstrained optimum (b), the conserved structure predicted by the nearest-neighbour consensus algorithm with gap penalty 4 or 6 kcal/mol (c), and the structure predicted by mfold as the optimum, when it was forced to include the consensus basepairs (d). Images produced by the sir graph utility of the mfold program [24].
Mentions: Interestingly, the sequence with the best prediction under the constrained re-folding was also the sequence with the worst prediction, unconstrained. Sequence V00336, from E. coli, was originally folded by mfold into a structure in which only the outermost helix matched the reference structure (0.26 MCC); a reported sub-optimal structure likewise only matched this one helix. However, with the 11 constrained base-pairs from the consensus structure, mfold was able to detect the complete structure: all canonical basepairs were predicted, with no incorrect predictions. The reference structure, the original mfold optimum, the consensus structure, and the constrained mfold optimum are shown in Figure 12. A similar, if not quite as spectacular, improvement occurred for the sequence X02627, which went from 0.32 MCC for the original mfold prediction, to 0.90 MCC for the refold.

Bottom Line: In ribonucleic acid (RNA) molecules whose function depends on their final, folded three-dimensional shape (such as those in ribosomes or spliceosome complexes), the secondary structure, defined by the set of internal basepair interactions, is more consistently conserved than the primary structure, defined by the sequence of nucleotides.We have found that applying a progressive, pairwise approach to the alignment of multiple ribonucleic acid sequences produces highly reliable predictions of conserved basepairs, and we have shown how these predictions can be used as constraints to improve the results of a single-sequence structure prediction algorithm.However, we have also discovered that the amount of detail included in a consensus structure prediction is highly dependent on the order in which sequences are added to the alignment (the guide tree), and that if a consensus structure does not have sufficient detail, it is less likely to provide useful constraints for the single-sequence method.

View Article: PubMed Central - HTML - PubMed

Affiliation: School of Information Technology and Engineering, University of Ottawa, Ottawa, Ontario, Canada. turcotte@site.uottawa.ca <turcotte@site.uottawa.ca>

ABSTRACT

Background: In ribonucleic acid (RNA) molecules whose function depends on their final, folded three-dimensional shape (such as those in ribosomes or spliceosome complexes), the secondary structure, defined by the set of internal basepair interactions, is more consistently conserved than the primary structure, defined by the sequence of nucleotides.

Results: The research presented here investigates the possibility of applying a progressive, pairwise approach to the alignment of multiple RNA sequences by simultaneously predicting an energy-optimized consensus secondary structure. We take an existing algorithm for finding the secondary structure common to two RNA sequences, Dynalign, and alter it to align profiles of multiple sequences. We then explore the relative successes of different approaches to designing the tree that will guide progressive alignments of sequence profiles to create a multiple alignment and prediction of conserved structure.

Conclusion: We have found that applying a progressive, pairwise approach to the alignment of multiple ribonucleic acid sequences produces highly reliable predictions of conserved basepairs, and we have shown how these predictions can be used as constraints to improve the results of a single-sequence structure prediction algorithm. However, we have also discovered that the amount of detail included in a consensus structure prediction is highly dependent on the order in which sequences are added to the alignment (the guide tree), and that if a consensus structure does not have sufficient detail, it is less likely to provide useful constraints for the single-sequence method.

Show MeSH
Related in: MedlinePlus