Limits...
RNAspa: a shortest path approach for comparative prediction of the secondary structure of ncRNA molecules.

Horesh Y, Doniger T, Michaeli S, Unger R - BMC Bioinformatics (2007)

Bottom Line: We also show that RNA secondary structures can be compared very rapidly by a simple string Edit-Distance algorithm with a minimal loss of accuracy.These datasets allowed for comparison of the algorithm with other methods.In these tests, RNAspa performed better than four other programs.

View Article: PubMed Central - HTML - PubMed

Affiliation: The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan 52900, Israel. yair@biomodel.os.biu.ac.il

ABSTRACT

Background: In recent years, RNA molecules that are not translated into proteins (ncRNAs) have drawn a great deal of attention, as they were shown to be involved in many cellular functions. One of the most important computational problems regarding ncRNA is to predict the secondary structure of a molecule from its sequence. In particular, we attempted to predict the secondary structure for a set of unaligned ncRNA molecules that are taken from the same family, and thus presumably have a similar structure.

Results: We developed the RNAspa program, which comparatively predicts the secondary structure for a set of ncRNA molecules in linear time in the number of molecules. We observed that in a list of several hundred suboptimal minimal free energy (MFE) predictions, as provided by the RNAsubopt program of the Vienna package, it is likely that at least one suggested structure would be similar to the true, correct one. The suboptimal solutions of each molecule are represented as a layer of vertices in a graph. The shortest path in this graph is the basis for structural predictions for the molecule. We also show that RNA secondary structures can be compared very rapidly by a simple string Edit-Distance algorithm with a minimal loss of accuracy. We show that this approach allows us to more deeply explore the suboptimal structure space.

Conclusion: The algorithm was tested on three datasets which include several ncRNA families taken from the Rfam database. These datasets allowed for comparison of the algorithm with other methods. In these tests, RNAspa performed better than four other programs.

Show MeSH

Related in: MedlinePlus

The influence of length on run-time. Comparison of the five programs processing increasingly longer windows of a set of five rRNA SSU sequences. Note that RNAspa was run in Boltzmann sampling mode across all sequence lengths. All but RNAspa failed to run on sequences greater than 450 bps due to memory constraints. StemLoc does not appear in the graph because it failed to process sequences of 100 bps or more. As expected, a cubic trendline (not shown) fits RNAspa's curve with the R2 value of 0.9965. RNAspa gave a MCC score of 0.34 for the complete ~1,800 bps long SSU family.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2147038&req=5

Figure 10: The influence of length on run-time. Comparison of the five programs processing increasingly longer windows of a set of five rRNA SSU sequences. Note that RNAspa was run in Boltzmann sampling mode across all sequence lengths. All but RNAspa failed to run on sequences greater than 450 bps due to memory constraints. StemLoc does not appear in the graph because it failed to process sequences of 100 bps or more. As expected, a cubic trendline (not shown) fits RNAspa's curve with the R2 value of 0.9965. RNAspa gave a MCC score of 0.34 for the complete ~1,800 bps long SSU family.

Mentions: We measured the run-time of the programs. Table 3 shows their performance under the same configuration used in Table 1. We also wanted to explore the relation between the run-time and the length of the sequences in the dataset. We used a set of five SSU rRNA sequences and we measured the run-time for increasingly longer windows. Figure 10 illustrates RNAspa's ability to outperform other programs both in terms of the effect that increasing sequence size has on the runtime and its ability to run on long sequences. All computations were performed on an Intel Xeon 3.0 GHz CPU with 8 GB RAM running Linux.


RNAspa: a shortest path approach for comparative prediction of the secondary structure of ncRNA molecules.

Horesh Y, Doniger T, Michaeli S, Unger R - BMC Bioinformatics (2007)

The influence of length on run-time. Comparison of the five programs processing increasingly longer windows of a set of five rRNA SSU sequences. Note that RNAspa was run in Boltzmann sampling mode across all sequence lengths. All but RNAspa failed to run on sequences greater than 450 bps due to memory constraints. StemLoc does not appear in the graph because it failed to process sequences of 100 bps or more. As expected, a cubic trendline (not shown) fits RNAspa's curve with the R2 value of 0.9965. RNAspa gave a MCC score of 0.34 for the complete ~1,800 bps long SSU family.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2147038&req=5

Figure 10: The influence of length on run-time. Comparison of the five programs processing increasingly longer windows of a set of five rRNA SSU sequences. Note that RNAspa was run in Boltzmann sampling mode across all sequence lengths. All but RNAspa failed to run on sequences greater than 450 bps due to memory constraints. StemLoc does not appear in the graph because it failed to process sequences of 100 bps or more. As expected, a cubic trendline (not shown) fits RNAspa's curve with the R2 value of 0.9965. RNAspa gave a MCC score of 0.34 for the complete ~1,800 bps long SSU family.
Mentions: We measured the run-time of the programs. Table 3 shows their performance under the same configuration used in Table 1. We also wanted to explore the relation between the run-time and the length of the sequences in the dataset. We used a set of five SSU rRNA sequences and we measured the run-time for increasingly longer windows. Figure 10 illustrates RNAspa's ability to outperform other programs both in terms of the effect that increasing sequence size has on the runtime and its ability to run on long sequences. All computations were performed on an Intel Xeon 3.0 GHz CPU with 8 GB RAM running Linux.

Bottom Line: We also show that RNA secondary structures can be compared very rapidly by a simple string Edit-Distance algorithm with a minimal loss of accuracy.These datasets allowed for comparison of the algorithm with other methods.In these tests, RNAspa performed better than four other programs.

View Article: PubMed Central - HTML - PubMed

Affiliation: The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan 52900, Israel. yair@biomodel.os.biu.ac.il

ABSTRACT

Background: In recent years, RNA molecules that are not translated into proteins (ncRNAs) have drawn a great deal of attention, as they were shown to be involved in many cellular functions. One of the most important computational problems regarding ncRNA is to predict the secondary structure of a molecule from its sequence. In particular, we attempted to predict the secondary structure for a set of unaligned ncRNA molecules that are taken from the same family, and thus presumably have a similar structure.

Results: We developed the RNAspa program, which comparatively predicts the secondary structure for a set of ncRNA molecules in linear time in the number of molecules. We observed that in a list of several hundred suboptimal minimal free energy (MFE) predictions, as provided by the RNAsubopt program of the Vienna package, it is likely that at least one suggested structure would be similar to the true, correct one. The suboptimal solutions of each molecule are represented as a layer of vertices in a graph. The shortest path in this graph is the basis for structural predictions for the molecule. We also show that RNA secondary structures can be compared very rapidly by a simple string Edit-Distance algorithm with a minimal loss of accuracy. We show that this approach allows us to more deeply explore the suboptimal structure space.

Conclusion: The algorithm was tested on three datasets which include several ncRNA families taken from the Rfam database. These datasets allowed for comparison of the algorithm with other methods. In these tests, RNAspa performed better than four other programs.

Show MeSH
Related in: MedlinePlus