Limits...
Predicting RNA secondary structure by the comparative approach: how to select the homologous sequences.

Engelen S, Tahi F - BMC Bioinformatics (2007)

Bottom Line: This problem of sequence selection is currently unsolved.We propose three models, based on different constraints on sequence alignments.SSCA enabled us to choose sets of homologous sequences that gave better predictions than arbitrarily chosen sets of homologous sequences.

View Article: PubMed Central - HTML - PubMed

Affiliation: Laboratoire IBISC - FRE CNRS 2873, CNRS, Université d'Evry Val-d'Essonne, Genopole, 523, place des Terrasses, 91000 Evry, France. stefan.engelen@ibisc.univ-evry.fr

ABSTRACT

Background: The secondary structure of an RNA must be known before the relationship between its structure and function can be determined. One way to predict the secondary structure of an RNA is to identify covarying residues that maintain the pairings (Watson-Crick, Wobble and non-canonical pairings). This "comparative approach" consists of identifying mutations from homologous sequence alignments. The sequences must covary enough for compensatory mutations to be revealed, but comparison is difficult if they are too different. Thus the choice of homologous sequences is critical. While many possible combinations of homologous sequences may be used for prediction, only a few will give good structure predictions. This can be due to poor quality alignment in stems or to the variability of certain sequences. This problem of sequence selection is currently unsolved.

Results: This paper describes an algorithm, SSCA, which measures the suitability of sequences for the comparative approach. It is based on evolutionary models with structure constraints, particularly those on sequence variations and stem alignment. We propose three models, based on different constraints on sequence alignments. We show the results of the SSCA algorithm for predicting the secondary structure of several RNAs. SSCA enabled us to choose sets of homologous sequences that gave better predictions than arbitrarily chosen sets of homologous sequences.

Conclusion: SSCA is an algorithm for selecting combinations of RNA homologous sequences suitable for secondary structure predictions with the comparative approach.

Show MeSH
Theoretical stem substitution matrices. Left top: Stem deviation matrix due to influences of transitions/transversions and of GU intermediate state on stem substitution matrices. Left bottom: Stem deviation matrix due to influences of GC stability. Right: Stem deviation matrix due to all the influences.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2238770&req=5

Figure 2: Theoretical stem substitution matrices. Left top: Stem deviation matrix due to influences of transitions/transversions and of GU intermediate state on stem substitution matrices. Left bottom: Stem deviation matrix due to influences of GC stability. Right: Stem deviation matrix due to all the influences.

Mentions: Let us consider all the possible substitutions between base pairs, without the less frequent GU base pairs. We can eliminate AU ↔ UA and CG ↔ GC substitutions since they are symmetrical and do not change the substitution matrices. For the four other substitutions, if GC pairs are preferred in stems, the balance between base pairs will tend towards GC base pairs. The result will be a deviation of the nucleotide substitution rates (Figure 2, left bottom). There must therefore be more A → C and A → G substitutions than A → U substitutions, and more U → C and U → G than U → A, and fewer C → A and C → U than C → G, and fewer G → A and G → U than G → C.


Predicting RNA secondary structure by the comparative approach: how to select the homologous sequences.

Engelen S, Tahi F - BMC Bioinformatics (2007)

Theoretical stem substitution matrices. Left top: Stem deviation matrix due to influences of transitions/transversions and of GU intermediate state on stem substitution matrices. Left bottom: Stem deviation matrix due to influences of GC stability. Right: Stem deviation matrix due to all the influences.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2238770&req=5

Figure 2: Theoretical stem substitution matrices. Left top: Stem deviation matrix due to influences of transitions/transversions and of GU intermediate state on stem substitution matrices. Left bottom: Stem deviation matrix due to influences of GC stability. Right: Stem deviation matrix due to all the influences.
Mentions: Let us consider all the possible substitutions between base pairs, without the less frequent GU base pairs. We can eliminate AU ↔ UA and CG ↔ GC substitutions since they are symmetrical and do not change the substitution matrices. For the four other substitutions, if GC pairs are preferred in stems, the balance between base pairs will tend towards GC base pairs. The result will be a deviation of the nucleotide substitution rates (Figure 2, left bottom). There must therefore be more A → C and A → G substitutions than A → U substitutions, and more U → C and U → G than U → A, and fewer C → A and C → U than C → G, and fewer G → A and G → U than G → C.

Bottom Line: This problem of sequence selection is currently unsolved.We propose three models, based on different constraints on sequence alignments.SSCA enabled us to choose sets of homologous sequences that gave better predictions than arbitrarily chosen sets of homologous sequences.

View Article: PubMed Central - HTML - PubMed

Affiliation: Laboratoire IBISC - FRE CNRS 2873, CNRS, Université d'Evry Val-d'Essonne, Genopole, 523, place des Terrasses, 91000 Evry, France. stefan.engelen@ibisc.univ-evry.fr

ABSTRACT

Background: The secondary structure of an RNA must be known before the relationship between its structure and function can be determined. One way to predict the secondary structure of an RNA is to identify covarying residues that maintain the pairings (Watson-Crick, Wobble and non-canonical pairings). This "comparative approach" consists of identifying mutations from homologous sequence alignments. The sequences must covary enough for compensatory mutations to be revealed, but comparison is difficult if they are too different. Thus the choice of homologous sequences is critical. While many possible combinations of homologous sequences may be used for prediction, only a few will give good structure predictions. This can be due to poor quality alignment in stems or to the variability of certain sequences. This problem of sequence selection is currently unsolved.

Results: This paper describes an algorithm, SSCA, which measures the suitability of sequences for the comparative approach. It is based on evolutionary models with structure constraints, particularly those on sequence variations and stem alignment. We propose three models, based on different constraints on sequence alignments. We show the results of the SSCA algorithm for predicting the secondary structure of several RNAs. SSCA enabled us to choose sets of homologous sequences that gave better predictions than arbitrarily chosen sets of homologous sequences.

Conclusion: SSCA is an algorithm for selecting combinations of RNA homologous sequences suitable for secondary structure predictions with the comparative approach.

Show MeSH