Limits...
Predicting RNA secondary structure by the comparative approach: how to select the homologous sequences.

Engelen S, Tahi F - BMC Bioinformatics (2007)

Bottom Line: This problem of sequence selection is currently unsolved.We propose three models, based on different constraints on sequence alignments.SSCA enabled us to choose sets of homologous sequences that gave better predictions than arbitrarily chosen sets of homologous sequences.

View Article: PubMed Central - HTML - PubMed

Affiliation: Laboratoire IBISC - FRE CNRS 2873, CNRS, Université d'Evry Val-d'Essonne, Genopole, 523, place des Terrasses, 91000 Evry, France. stefan.engelen@ibisc.univ-evry.fr

ABSTRACT

Background: The secondary structure of an RNA must be known before the relationship between its structure and function can be determined. One way to predict the secondary structure of an RNA is to identify covarying residues that maintain the pairings (Watson-Crick, Wobble and non-canonical pairings). This "comparative approach" consists of identifying mutations from homologous sequence alignments. The sequences must covary enough for compensatory mutations to be revealed, but comparison is difficult if they are too different. Thus the choice of homologous sequences is critical. While many possible combinations of homologous sequences may be used for prediction, only a few will give good structure predictions. This can be due to poor quality alignment in stems or to the variability of certain sequences. This problem of sequence selection is currently unsolved.

Results: This paper describes an algorithm, SSCA, which measures the suitability of sequences for the comparative approach. It is based on evolutionary models with structure constraints, particularly those on sequence variations and stem alignment. We propose three models, based on different constraints on sequence alignments. We show the results of the SSCA algorithm for predicting the secondary structure of several RNAs. SSCA enabled us to choose sets of homologous sequences that gave better predictions than arbitrarily chosen sets of homologous sequences.

Conclusion: SSCA is an algorithm for selecting combinations of RNA homologous sequences suitable for secondary structure predictions with the comparative approach.

Show MeSH
Correlation between SSCA scores (using the model ) and average MCC scores of homologous sequences of tmRNA (left) and RNaseP (right) alignments. Homologous sequences with the lowest SSCA scores have the highest average MCC scores. The best correlation is for the low SSCA scores.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2238770&req=5

Figure 4: Correlation between SSCA scores (using the model ) and average MCC scores of homologous sequences of tmRNA (left) and RNaseP (right) alignments. Homologous sequences with the lowest SSCA scores have the highest average MCC scores. The best correlation is for the low SSCA scores.

Mentions: We plotted the correlation between the average MCC score of each homologous sequence and the score SGC+GU attributed to this sequence with the model (Figure 4). Figure 4 shows that the homologous sequences with the lowest SSCA scores have the highest average MCC scores, validating our model and algorithm for selecting homologous sequences.


Predicting RNA secondary structure by the comparative approach: how to select the homologous sequences.

Engelen S, Tahi F - BMC Bioinformatics (2007)

Correlation between SSCA scores (using the model ) and average MCC scores of homologous sequences of tmRNA (left) and RNaseP (right) alignments. Homologous sequences with the lowest SSCA scores have the highest average MCC scores. The best correlation is for the low SSCA scores.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2238770&req=5

Figure 4: Correlation between SSCA scores (using the model ) and average MCC scores of homologous sequences of tmRNA (left) and RNaseP (right) alignments. Homologous sequences with the lowest SSCA scores have the highest average MCC scores. The best correlation is for the low SSCA scores.
Mentions: We plotted the correlation between the average MCC score of each homologous sequence and the score SGC+GU attributed to this sequence with the model (Figure 4). Figure 4 shows that the homologous sequences with the lowest SSCA scores have the highest average MCC scores, validating our model and algorithm for selecting homologous sequences.

Bottom Line: This problem of sequence selection is currently unsolved.We propose three models, based on different constraints on sequence alignments.SSCA enabled us to choose sets of homologous sequences that gave better predictions than arbitrarily chosen sets of homologous sequences.

View Article: PubMed Central - HTML - PubMed

Affiliation: Laboratoire IBISC - FRE CNRS 2873, CNRS, Université d'Evry Val-d'Essonne, Genopole, 523, place des Terrasses, 91000 Evry, France. stefan.engelen@ibisc.univ-evry.fr

ABSTRACT

Background: The secondary structure of an RNA must be known before the relationship between its structure and function can be determined. One way to predict the secondary structure of an RNA is to identify covarying residues that maintain the pairings (Watson-Crick, Wobble and non-canonical pairings). This "comparative approach" consists of identifying mutations from homologous sequence alignments. The sequences must covary enough for compensatory mutations to be revealed, but comparison is difficult if they are too different. Thus the choice of homologous sequences is critical. While many possible combinations of homologous sequences may be used for prediction, only a few will give good structure predictions. This can be due to poor quality alignment in stems or to the variability of certain sequences. This problem of sequence selection is currently unsolved.

Results: This paper describes an algorithm, SSCA, which measures the suitability of sequences for the comparative approach. It is based on evolutionary models with structure constraints, particularly those on sequence variations and stem alignment. We propose three models, based on different constraints on sequence alignments. We show the results of the SSCA algorithm for predicting the secondary structure of several RNAs. SSCA enabled us to choose sets of homologous sequences that gave better predictions than arbitrarily chosen sets of homologous sequences.

Conclusion: SSCA is an algorithm for selecting combinations of RNA homologous sequences suitable for secondary structure predictions with the comparative approach.

Show MeSH