Limits...
RNAspa: a shortest path approach for comparative prediction of the secondary structure of ncRNA molecules.

Horesh Y, Doniger T, Michaeli S, Unger R - BMC Bioinformatics (2007)

Bottom Line: We also show that RNA secondary structures can be compared very rapidly by a simple string Edit-Distance algorithm with a minimal loss of accuracy.These datasets allowed for comparison of the algorithm with other methods.In these tests, RNAspa performed better than four other programs.

View Article: PubMed Central - HTML - PubMed

Affiliation: The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan 52900, Israel. yair@biomodel.os.biu.ac.il

ABSTRACT

Background: In recent years, RNA molecules that are not translated into proteins (ncRNAs) have drawn a great deal of attention, as they were shown to be involved in many cellular functions. One of the most important computational problems regarding ncRNA is to predict the secondary structure of a molecule from its sequence. In particular, we attempted to predict the secondary structure for a set of unaligned ncRNA molecules that are taken from the same family, and thus presumably have a similar structure.

Results: We developed the RNAspa program, which comparatively predicts the secondary structure for a set of ncRNA molecules in linear time in the number of molecules. We observed that in a list of several hundred suboptimal minimal free energy (MFE) predictions, as provided by the RNAsubopt program of the Vienna package, it is likely that at least one suggested structure would be similar to the true, correct one. The suboptimal solutions of each molecule are represented as a layer of vertices in a graph. The shortest path in this graph is the basis for structural predictions for the molecule. We also show that RNA secondary structures can be compared very rapidly by a simple string Edit-Distance algorithm with a minimal loss of accuracy. We show that this approach allows us to more deeply explore the suboptimal structure space.

Conclusion: The algorithm was tested on three datasets which include several ncRNA families taken from the Rfam database. These datasets allowed for comparison of the algorithm with other methods. In these tests, RNAspa performed better than four other programs.

Show MeSH

Related in: MedlinePlus

The influence of sampling on accuracy. MCC score over 120 (5!) permutations of an FMN dataset. As the number of samples increases, the MCC score is improved. The blue line is the same line as shown for FMN in Figure 3. Note that as the number of samplings increases above four, the improvement diminishes.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2147038&req=5

Figure 8: The influence of sampling on accuracy. MCC score over 120 (5!) permutations of an FMN dataset. As the number of samples increases, the MCC score is improved. The blue line is the same line as shown for FMN in Figure 3. Note that as the number of samplings increases above four, the improvement diminishes.

Mentions: The second parameter used by RNAspa is the number of samplings performed on the space of permutations of the order of the sequences. As we illustrated above, comparing the sequences in an arbitrary order is likely to yield an average MCC score. However, our results suggested that performing a small number of samplings on the order permutation space is sufficient to ensure reliable results. On the other hand, it is difficult to identify the optimal order, as the search space is exponential in the number of sequences. In practice, the optimal result offers only a small improvement over the typical performance of the algorithm, while it demands exponential time as there are N! possible permutations, where N is the number of sequences. The contribution of the number of samplings to the accuracy level is illustrated in Figure 8. With small sample size, the MCC score improves as the number of samples taken increases, but it reaches limiting returns when the number of samples exceeds four. The diminishing improvement can be explained by the fact that only a small fraction of the order permutations need to be re-sampled in order to significantly improve their score.


RNAspa: a shortest path approach for comparative prediction of the secondary structure of ncRNA molecules.

Horesh Y, Doniger T, Michaeli S, Unger R - BMC Bioinformatics (2007)

The influence of sampling on accuracy. MCC score over 120 (5!) permutations of an FMN dataset. As the number of samples increases, the MCC score is improved. The blue line is the same line as shown for FMN in Figure 3. Note that as the number of samplings increases above four, the improvement diminishes.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2147038&req=5

Figure 8: The influence of sampling on accuracy. MCC score over 120 (5!) permutations of an FMN dataset. As the number of samples increases, the MCC score is improved. The blue line is the same line as shown for FMN in Figure 3. Note that as the number of samplings increases above four, the improvement diminishes.
Mentions: The second parameter used by RNAspa is the number of samplings performed on the space of permutations of the order of the sequences. As we illustrated above, comparing the sequences in an arbitrary order is likely to yield an average MCC score. However, our results suggested that performing a small number of samplings on the order permutation space is sufficient to ensure reliable results. On the other hand, it is difficult to identify the optimal order, as the search space is exponential in the number of sequences. In practice, the optimal result offers only a small improvement over the typical performance of the algorithm, while it demands exponential time as there are N! possible permutations, where N is the number of sequences. The contribution of the number of samplings to the accuracy level is illustrated in Figure 8. With small sample size, the MCC score improves as the number of samples taken increases, but it reaches limiting returns when the number of samples exceeds four. The diminishing improvement can be explained by the fact that only a small fraction of the order permutations need to be re-sampled in order to significantly improve their score.

Bottom Line: We also show that RNA secondary structures can be compared very rapidly by a simple string Edit-Distance algorithm with a minimal loss of accuracy.These datasets allowed for comparison of the algorithm with other methods.In these tests, RNAspa performed better than four other programs.

View Article: PubMed Central - HTML - PubMed

Affiliation: The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan 52900, Israel. yair@biomodel.os.biu.ac.il

ABSTRACT

Background: In recent years, RNA molecules that are not translated into proteins (ncRNAs) have drawn a great deal of attention, as they were shown to be involved in many cellular functions. One of the most important computational problems regarding ncRNA is to predict the secondary structure of a molecule from its sequence. In particular, we attempted to predict the secondary structure for a set of unaligned ncRNA molecules that are taken from the same family, and thus presumably have a similar structure.

Results: We developed the RNAspa program, which comparatively predicts the secondary structure for a set of ncRNA molecules in linear time in the number of molecules. We observed that in a list of several hundred suboptimal minimal free energy (MFE) predictions, as provided by the RNAsubopt program of the Vienna package, it is likely that at least one suggested structure would be similar to the true, correct one. The suboptimal solutions of each molecule are represented as a layer of vertices in a graph. The shortest path in this graph is the basis for structural predictions for the molecule. We also show that RNA secondary structures can be compared very rapidly by a simple string Edit-Distance algorithm with a minimal loss of accuracy. We show that this approach allows us to more deeply explore the suboptimal structure space.

Conclusion: The algorithm was tested on three datasets which include several ncRNA families taken from the Rfam database. These datasets allowed for comparison of the algorithm with other methods. In these tests, RNAspa performed better than four other programs.

Show MeSH
Related in: MedlinePlus