Limits...
RSEARCH: finding homologs of single structured RNA sequences.

Klein RJ, Eddy SR - BMC Bioinformatics (2003)

Bottom Line: RSEARCH reports the statistical confidence for each hit as well as the structural alignment of the hit.The primary drawback of the program is that it is slow.The C code for RSEARCH is freely available from our lab's website.

View Article: PubMed Central - HTML - PubMed

Affiliation: Howard Hughes Medical Institute & Department of Genetics, Washington University School of Medicine, Saint Louis, Missouri 63110, USA. rjklein@linkage.rockefeller.edu

ABSTRACT

Background: For many RNA molecules, secondary structure rather than primary sequence is the evolutionarily conserved feature. No programs have yet been published that allow searching a sequence database for homologs of a single RNA molecule on the basis of secondary structure.

Results: We have developed a program, RSEARCH, that takes a single RNA sequence with its secondary structure and utilizes a local alignment algorithm to search a database for homologous RNAs. For this purpose, we have developed a series of base pair and single nucleotide substitution matrices for RNA sequences called RIBOSUM matrices. RSEARCH reports the statistical confidence for each hit as well as the structural alignment of the hit. We show several examples in which RSEARCH outperforms the primary sequence search programs BLAST and SSEARCH. The primary drawback of the program is that it is slow. The C code for RSEARCH is freely available from our lab's website.

Conclusion: RSEARCH outperforms primary sequence programs in finding homologs of structured RNA sequences.

Show MeSH

Related in: MedlinePlus

The two classes of local alignment. Each example shows how the nodal guide tree best aligns to the target sequence. At the bottom is the RSEARCH output for the alignment. On the left is an example of begin locality, while on the right is an example of end locality. The numbers next to the query sequence represent positions relative to the entire query; the numbers next to the target sequence represent positions relative to the subsequence defined in the "Target =" line.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC239859&req=5

Figure 2: The two classes of local alignment. Each example shows how the nodal guide tree best aligns to the target sequence. At the bottom is the RSEARCH output for the alignment. On the left is an example of begin locality, while on the right is an example of end locality. The numbers next to the query sequence represent positions relative to the entire query; the numbers next to the target sequence represent positions relative to the subsequence defined in the "Target =" line.

Mentions: The model as described above can only perform global alignment with respect to the query sequence. The model is modified slightly to allow for local alignment as well. Two different types of locality are allowed. The first we call "begin locality," and resembles local alignment as implemented in the Smith-Waterman algorithm [11]. In this case, a penalty – beginsc – is taken if the alignment begins inside the model, i.e. the states representing the outermost parts of the RNA secondary structure are not included. This is analogous to the convention in the Smith-Waterman algorithm that there is no penalty (score of 0) for a local rather than a global alignment [11]. Following that convention, the beginsc penalty is set by default to 0. The second type of locality is "end locality." In this case, a penalty – endsc – is taken to allow the subtree of a model below the current state to be ignored, and replaced by an insertion of arbitrary size in the target sequence. There are many known examples of stems whose length changes dramatically or even completely disappear in homologous RNA sequences. One such example is the P8 stem of Archaeal RNase P, which is not present in the RNase P RNA of Methanocaldococcus jannaschii and Archaeoglobus fulgidus, but is present in other Archaeal RNase P sequences [43]. Examples of both kinds of locality are shown in Figure 2.


RSEARCH: finding homologs of single structured RNA sequences.

Klein RJ, Eddy SR - BMC Bioinformatics (2003)

The two classes of local alignment. Each example shows how the nodal guide tree best aligns to the target sequence. At the bottom is the RSEARCH output for the alignment. On the left is an example of begin locality, while on the right is an example of end locality. The numbers next to the query sequence represent positions relative to the entire query; the numbers next to the target sequence represent positions relative to the subsequence defined in the "Target =" line.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC239859&req=5

Figure 2: The two classes of local alignment. Each example shows how the nodal guide tree best aligns to the target sequence. At the bottom is the RSEARCH output for the alignment. On the left is an example of begin locality, while on the right is an example of end locality. The numbers next to the query sequence represent positions relative to the entire query; the numbers next to the target sequence represent positions relative to the subsequence defined in the "Target =" line.
Mentions: The model as described above can only perform global alignment with respect to the query sequence. The model is modified slightly to allow for local alignment as well. Two different types of locality are allowed. The first we call "begin locality," and resembles local alignment as implemented in the Smith-Waterman algorithm [11]. In this case, a penalty – beginsc – is taken if the alignment begins inside the model, i.e. the states representing the outermost parts of the RNA secondary structure are not included. This is analogous to the convention in the Smith-Waterman algorithm that there is no penalty (score of 0) for a local rather than a global alignment [11]. Following that convention, the beginsc penalty is set by default to 0. The second type of locality is "end locality." In this case, a penalty – endsc – is taken to allow the subtree of a model below the current state to be ignored, and replaced by an insertion of arbitrary size in the target sequence. There are many known examples of stems whose length changes dramatically or even completely disappear in homologous RNA sequences. One such example is the P8 stem of Archaeal RNase P, which is not present in the RNase P RNA of Methanocaldococcus jannaschii and Archaeoglobus fulgidus, but is present in other Archaeal RNase P sequences [43]. Examples of both kinds of locality are shown in Figure 2.

Bottom Line: RSEARCH reports the statistical confidence for each hit as well as the structural alignment of the hit.The primary drawback of the program is that it is slow.The C code for RSEARCH is freely available from our lab's website.

View Article: PubMed Central - HTML - PubMed

Affiliation: Howard Hughes Medical Institute & Department of Genetics, Washington University School of Medicine, Saint Louis, Missouri 63110, USA. rjklein@linkage.rockefeller.edu

ABSTRACT

Background: For many RNA molecules, secondary structure rather than primary sequence is the evolutionarily conserved feature. No programs have yet been published that allow searching a sequence database for homologs of a single RNA molecule on the basis of secondary structure.

Results: We have developed a program, RSEARCH, that takes a single RNA sequence with its secondary structure and utilizes a local alignment algorithm to search a database for homologous RNAs. For this purpose, we have developed a series of base pair and single nucleotide substitution matrices for RNA sequences called RIBOSUM matrices. RSEARCH reports the statistical confidence for each hit as well as the structural alignment of the hit. We show several examples in which RSEARCH outperforms the primary sequence search programs BLAST and SSEARCH. The primary drawback of the program is that it is slow. The C code for RSEARCH is freely available from our lab's website.

Conclusion: RSEARCH outperforms primary sequence programs in finding homologs of structured RNA sequences.

Show MeSH
Related in: MedlinePlus