Limits...
RSEARCH: finding homologs of single structured RNA sequences.

Klein RJ, Eddy SR - BMC Bioinformatics (2003)

Bottom Line: RSEARCH reports the statistical confidence for each hit as well as the structural alignment of the hit.The primary drawback of the program is that it is slow.The C code for RSEARCH is freely available from our lab's website.

View Article: PubMed Central - HTML - PubMed

Affiliation: Howard Hughes Medical Institute & Department of Genetics, Washington University School of Medicine, Saint Louis, Missouri 63110, USA. rjklein@linkage.rockefeller.edu

ABSTRACT

Background: For many RNA molecules, secondary structure rather than primary sequence is the evolutionarily conserved feature. No programs have yet been published that allow searching a sequence database for homologs of a single RNA molecule on the basis of secondary structure.

Results: We have developed a program, RSEARCH, that takes a single RNA sequence with its secondary structure and utilizes a local alignment algorithm to search a database for homologous RNAs. For this purpose, we have developed a series of base pair and single nucleotide substitution matrices for RNA sequences called RIBOSUM matrices. RSEARCH reports the statistical confidence for each hit as well as the structural alignment of the hit. We show several examples in which RSEARCH outperforms the primary sequence search programs BLAST and SSEARCH. The primary drawback of the program is that it is slow. The C code for RSEARCH is freely available from our lab's website.

Conclusion: RSEARCH outperforms primary sequence programs in finding homologs of structured RNA sequences.

Show MeSH

Related in: MedlinePlus

The RIBOSUM85-60 matrix. The 16 × 16 matrix is used to get scores for aligning base pairs. The 4 × 4 matrix is used to get scores for aligning single-stranded regions. Positive scores are shaded.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC239859&req=5

Figure 3: The RIBOSUM85-60 matrix. The 16 × 16 matrix is used to get scores for aligning base pairs. The 4 × 4 matrix is used to get scores for aligning single-stranded regions. Positive scores are shaded.

Mentions: RIBOSUM85-60 has several characteristics typical of these matrices (Figure 3). It consists of two matrices – one 16 × 16 for base pair substitutions and the other 4 × 4 for single nucleotide substitutions. In the singlue nucleotide substitution matrix, the A-A identity has a score (2.22) much larger than the other single nucleotide identities. This suggests that conserved As are especially common in single stranded regions of 16S ribosomal RNA. Unlike typical nucleotide or amino acid substitution matrices, not all values on the identity diagonal of the 16 × 16 matrix are positive. This reflects the specificity of base pairing. Canonical Watson-Crick and G-U pairs are observed much more often than non-canonical pairs. Since non-canonical pairs occur less often than expected on the basis of individual nucleotide probabilities, the log-odds score for these pairs aligned to themselves is negative. Second, substitution of one canonical pair for another usually gives a positive score (e.g. A-U to C-G has a score of 1.47). Therefore, the RIBOSUM matrices resemble what we intuitively assume a good base pairing substitution matrix would look like.


RSEARCH: finding homologs of single structured RNA sequences.

Klein RJ, Eddy SR - BMC Bioinformatics (2003)

The RIBOSUM85-60 matrix. The 16 × 16 matrix is used to get scores for aligning base pairs. The 4 × 4 matrix is used to get scores for aligning single-stranded regions. Positive scores are shaded.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC239859&req=5

Figure 3: The RIBOSUM85-60 matrix. The 16 × 16 matrix is used to get scores for aligning base pairs. The 4 × 4 matrix is used to get scores for aligning single-stranded regions. Positive scores are shaded.
Mentions: RIBOSUM85-60 has several characteristics typical of these matrices (Figure 3). It consists of two matrices – one 16 × 16 for base pair substitutions and the other 4 × 4 for single nucleotide substitutions. In the singlue nucleotide substitution matrix, the A-A identity has a score (2.22) much larger than the other single nucleotide identities. This suggests that conserved As are especially common in single stranded regions of 16S ribosomal RNA. Unlike typical nucleotide or amino acid substitution matrices, not all values on the identity diagonal of the 16 × 16 matrix are positive. This reflects the specificity of base pairing. Canonical Watson-Crick and G-U pairs are observed much more often than non-canonical pairs. Since non-canonical pairs occur less often than expected on the basis of individual nucleotide probabilities, the log-odds score for these pairs aligned to themselves is negative. Second, substitution of one canonical pair for another usually gives a positive score (e.g. A-U to C-G has a score of 1.47). Therefore, the RIBOSUM matrices resemble what we intuitively assume a good base pairing substitution matrix would look like.

Bottom Line: RSEARCH reports the statistical confidence for each hit as well as the structural alignment of the hit.The primary drawback of the program is that it is slow.The C code for RSEARCH is freely available from our lab's website.

View Article: PubMed Central - HTML - PubMed

Affiliation: Howard Hughes Medical Institute & Department of Genetics, Washington University School of Medicine, Saint Louis, Missouri 63110, USA. rjklein@linkage.rockefeller.edu

ABSTRACT

Background: For many RNA molecules, secondary structure rather than primary sequence is the evolutionarily conserved feature. No programs have yet been published that allow searching a sequence database for homologs of a single RNA molecule on the basis of secondary structure.

Results: We have developed a program, RSEARCH, that takes a single RNA sequence with its secondary structure and utilizes a local alignment algorithm to search a database for homologous RNAs. For this purpose, we have developed a series of base pair and single nucleotide substitution matrices for RNA sequences called RIBOSUM matrices. RSEARCH reports the statistical confidence for each hit as well as the structural alignment of the hit. We show several examples in which RSEARCH outperforms the primary sequence search programs BLAST and SSEARCH. The primary drawback of the program is that it is slow. The C code for RSEARCH is freely available from our lab's website.

Conclusion: RSEARCH outperforms primary sequence programs in finding homologs of structured RNA sequences.

Show MeSH
Related in: MedlinePlus