Limits...
RNAspa: a shortest path approach for comparative prediction of the secondary structure of ncRNA molecules.

Horesh Y, Doniger T, Michaeli S, Unger R - BMC Bioinformatics (2007)

Bottom Line: We also show that RNA secondary structures can be compared very rapidly by a simple string Edit-Distance algorithm with a minimal loss of accuracy.These datasets allowed for comparison of the algorithm with other methods.In these tests, RNAspa performed better than four other programs.

View Article: PubMed Central - HTML - PubMed

Affiliation: The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan 52900, Israel. yair@biomodel.os.biu.ac.il

ABSTRACT

Background: In recent years, RNA molecules that are not translated into proteins (ncRNAs) have drawn a great deal of attention, as they were shown to be involved in many cellular functions. One of the most important computational problems regarding ncRNA is to predict the secondary structure of a molecule from its sequence. In particular, we attempted to predict the secondary structure for a set of unaligned ncRNA molecules that are taken from the same family, and thus presumably have a similar structure.

Results: We developed the RNAspa program, which comparatively predicts the secondary structure for a set of ncRNA molecules in linear time in the number of molecules. We observed that in a list of several hundred suboptimal minimal free energy (MFE) predictions, as provided by the RNAsubopt program of the Vienna package, it is likely that at least one suggested structure would be similar to the true, correct one. The suboptimal solutions of each molecule are represented as a layer of vertices in a graph. The shortest path in this graph is the basis for structural predictions for the molecule. We also show that RNA secondary structures can be compared very rapidly by a simple string Edit-Distance algorithm with a minimal loss of accuracy. We show that this approach allows us to more deeply explore the suboptimal structure space.

Conclusion: The algorithm was tested on three datasets which include several ncRNA families taken from the Rfam database. These datasets allowed for comparison of the algorithm with other methods. In these tests, RNAspa performed better than four other programs.

Show MeSH

Related in: MedlinePlus

An example with a contaminated dataset. The leftmost bar represents the MCC score of a set of ten Purine sequences. The rightmost bar represents a set of ten sequences from the Lysine family. Towards the middle of the graph, the sets become more and more mixed. Note that with increasing contamination, the results tend to deteriorate, but in general the method is robust to low levels of contamination.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2147038&req=5

Figure 11: An example with a contaminated dataset. The leftmost bar represents the MCC score of a set of ten Purine sequences. The rightmost bar represents a set of ten sequences from the Lysine family. Towards the middle of the graph, the sets become more and more mixed. Note that with increasing contamination, the results tend to deteriorate, but in general the method is robust to low levels of contamination.

Mentions: We further investigated the extent to which the performance of RNAspa can withstand the effects of contaminated data. Specifically in our algorithm, a sequence that has a very different set of potential suboptimal structures would break the path into two detached components. As one would expect, a contaminated set reduces the performance of the algorithm. However, RNAsubopt's worst MCC score serves as a 'safety net' in these cases. Figure 11 shows the performance of the algorithm when two different datasets (Purine and Lysine) were mixed together in varying proportions starting with a set of ten Lysine sequences, followed by a set of nine Lysine and one Purine, then eight Lysine and two Purine, and so on. The results show that our method is quite robust to this kind of contamination, although, as expected, as the number of sequences that do no belong to the family increases, there is a negative effect on the performance.


RNAspa: a shortest path approach for comparative prediction of the secondary structure of ncRNA molecules.

Horesh Y, Doniger T, Michaeli S, Unger R - BMC Bioinformatics (2007)

An example with a contaminated dataset. The leftmost bar represents the MCC score of a set of ten Purine sequences. The rightmost bar represents a set of ten sequences from the Lysine family. Towards the middle of the graph, the sets become more and more mixed. Note that with increasing contamination, the results tend to deteriorate, but in general the method is robust to low levels of contamination.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2147038&req=5

Figure 11: An example with a contaminated dataset. The leftmost bar represents the MCC score of a set of ten Purine sequences. The rightmost bar represents a set of ten sequences from the Lysine family. Towards the middle of the graph, the sets become more and more mixed. Note that with increasing contamination, the results tend to deteriorate, but in general the method is robust to low levels of contamination.
Mentions: We further investigated the extent to which the performance of RNAspa can withstand the effects of contaminated data. Specifically in our algorithm, a sequence that has a very different set of potential suboptimal structures would break the path into two detached components. As one would expect, a contaminated set reduces the performance of the algorithm. However, RNAsubopt's worst MCC score serves as a 'safety net' in these cases. Figure 11 shows the performance of the algorithm when two different datasets (Purine and Lysine) were mixed together in varying proportions starting with a set of ten Lysine sequences, followed by a set of nine Lysine and one Purine, then eight Lysine and two Purine, and so on. The results show that our method is quite robust to this kind of contamination, although, as expected, as the number of sequences that do no belong to the family increases, there is a negative effect on the performance.

Bottom Line: We also show that RNA secondary structures can be compared very rapidly by a simple string Edit-Distance algorithm with a minimal loss of accuracy.These datasets allowed for comparison of the algorithm with other methods.In these tests, RNAspa performed better than four other programs.

View Article: PubMed Central - HTML - PubMed

Affiliation: The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan 52900, Israel. yair@biomodel.os.biu.ac.il

ABSTRACT

Background: In recent years, RNA molecules that are not translated into proteins (ncRNAs) have drawn a great deal of attention, as they were shown to be involved in many cellular functions. One of the most important computational problems regarding ncRNA is to predict the secondary structure of a molecule from its sequence. In particular, we attempted to predict the secondary structure for a set of unaligned ncRNA molecules that are taken from the same family, and thus presumably have a similar structure.

Results: We developed the RNAspa program, which comparatively predicts the secondary structure for a set of ncRNA molecules in linear time in the number of molecules. We observed that in a list of several hundred suboptimal minimal free energy (MFE) predictions, as provided by the RNAsubopt program of the Vienna package, it is likely that at least one suggested structure would be similar to the true, correct one. The suboptimal solutions of each molecule are represented as a layer of vertices in a graph. The shortest path in this graph is the basis for structural predictions for the molecule. We also show that RNA secondary structures can be compared very rapidly by a simple string Edit-Distance algorithm with a minimal loss of accuracy. We show that this approach allows us to more deeply explore the suboptimal structure space.

Conclusion: The algorithm was tested on three datasets which include several ncRNA families taken from the Rfam database. These datasets allowed for comparison of the algorithm with other methods. In these tests, RNAspa performed better than four other programs.

Show MeSH
Related in: MedlinePlus