Limits...
RNAspa: a shortest path approach for comparative prediction of the secondary structure of ncRNA molecules.

Horesh Y, Doniger T, Michaeli S, Unger R - BMC Bioinformatics (2007)

Bottom Line: We also show that RNA secondary structures can be compared very rapidly by a simple string Edit-Distance algorithm with a minimal loss of accuracy.These datasets allowed for comparison of the algorithm with other methods.In these tests, RNAspa performed better than four other programs.

View Article: PubMed Central - HTML - PubMed

Affiliation: The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan 52900, Israel. yair@biomodel.os.biu.ac.il

ABSTRACT

Background: In recent years, RNA molecules that are not translated into proteins (ncRNAs) have drawn a great deal of attention, as they were shown to be involved in many cellular functions. One of the most important computational problems regarding ncRNA is to predict the secondary structure of a molecule from its sequence. In particular, we attempted to predict the secondary structure for a set of unaligned ncRNA molecules that are taken from the same family, and thus presumably have a similar structure.

Results: We developed the RNAspa program, which comparatively predicts the secondary structure for a set of ncRNA molecules in linear time in the number of molecules. We observed that in a list of several hundred suboptimal minimal free energy (MFE) predictions, as provided by the RNAsubopt program of the Vienna package, it is likely that at least one suggested structure would be similar to the true, correct one. The suboptimal solutions of each molecule are represented as a layer of vertices in a graph. The shortest path in this graph is the basis for structural predictions for the molecule. We also show that RNA secondary structures can be compared very rapidly by a simple string Edit-Distance algorithm with a minimal loss of accuracy. We show that this approach allows us to more deeply explore the suboptimal structure space.

Conclusion: The algorithm was tested on three datasets which include several ncRNA families taken from the Rfam database. These datasets allowed for comparison of the algorithm with other methods. In these tests, RNAspa performed better than four other programs.

Show MeSH

Related in: MedlinePlus

Shortest path vs. Sum-of-Pairs. The graph shows the MCC score over 120 (5!) permutations order of the five sequences of the glmS dataset that was used in Table 1. The MCC score is ranked from worst (left) to best (right). The lower (blue) line shows the value achieved by each one of the 120 permutations. The green line shows the increase in accuracy when the best of five different orders was reported. The upper (red) line shows the results where the path was ranked by using the Sum-of-Pairs approach i.e. summing the comparisons between all the pairs that comprise the path. These results clearly show that using the Sum-of-Pairs measure yields better predictions.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2147038&req=5

Figure 6: Shortest path vs. Sum-of-Pairs. The graph shows the MCC score over 120 (5!) permutations order of the five sequences of the glmS dataset that was used in Table 1. The MCC score is ranked from worst (left) to best (right). The lower (blue) line shows the value achieved by each one of the 120 permutations. The green line shows the increase in accuracy when the best of five different orders was reported. The upper (red) line shows the results where the path was ranked by using the Sum-of-Pairs approach i.e. summing the comparisons between all the pairs that comprise the path. These results clearly show that using the Sum-of-Pairs measure yields better predictions.

Mentions: We first tested the robustness of our algorithm to the arbitrary order of the sequences chosen. We randomly chose a set of five sequences from each family. We ran RNAspa on all 5! = 120 order permutations to measure the variation of the accuracy level as a function of the sequence order. We used 150 suboptimal structures for each sequence. Figure 5 shows that while some permutations show a significant deviation from the typical MCC score, most permutations exhibit similar results; thus, it is unlikely that a particular randomly selected order will yield the poorest results. Nevertheless, as described above, in order to reduce the probability of choosing an order that will yield a poor prediction, for each run we used several arbitrary fixed orders and calculated the shortest path for each of them. The winning solution was the solution with the lowest Sum-of-Pairs value between all the secondary structures comprising its shortest path. Figure 6 shows that selecting the winning permutation based on the Sum-of-Pairs of the predictions comprising the shortest path is an improvement over using the permutation with the shortest path. The fact that the algorithm is based on calculating paths in linear time, and only in the final stage is a quadratic time Sum-of-Pair score calculated, enables the algorithm to scale, in practice, almost linearly with the number of sequences.


RNAspa: a shortest path approach for comparative prediction of the secondary structure of ncRNA molecules.

Horesh Y, Doniger T, Michaeli S, Unger R - BMC Bioinformatics (2007)

Shortest path vs. Sum-of-Pairs. The graph shows the MCC score over 120 (5!) permutations order of the five sequences of the glmS dataset that was used in Table 1. The MCC score is ranked from worst (left) to best (right). The lower (blue) line shows the value achieved by each one of the 120 permutations. The green line shows the increase in accuracy when the best of five different orders was reported. The upper (red) line shows the results where the path was ranked by using the Sum-of-Pairs approach i.e. summing the comparisons between all the pairs that comprise the path. These results clearly show that using the Sum-of-Pairs measure yields better predictions.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2147038&req=5

Figure 6: Shortest path vs. Sum-of-Pairs. The graph shows the MCC score over 120 (5!) permutations order of the five sequences of the glmS dataset that was used in Table 1. The MCC score is ranked from worst (left) to best (right). The lower (blue) line shows the value achieved by each one of the 120 permutations. The green line shows the increase in accuracy when the best of five different orders was reported. The upper (red) line shows the results where the path was ranked by using the Sum-of-Pairs approach i.e. summing the comparisons between all the pairs that comprise the path. These results clearly show that using the Sum-of-Pairs measure yields better predictions.
Mentions: We first tested the robustness of our algorithm to the arbitrary order of the sequences chosen. We randomly chose a set of five sequences from each family. We ran RNAspa on all 5! = 120 order permutations to measure the variation of the accuracy level as a function of the sequence order. We used 150 suboptimal structures for each sequence. Figure 5 shows that while some permutations show a significant deviation from the typical MCC score, most permutations exhibit similar results; thus, it is unlikely that a particular randomly selected order will yield the poorest results. Nevertheless, as described above, in order to reduce the probability of choosing an order that will yield a poor prediction, for each run we used several arbitrary fixed orders and calculated the shortest path for each of them. The winning solution was the solution with the lowest Sum-of-Pairs value between all the secondary structures comprising its shortest path. Figure 6 shows that selecting the winning permutation based on the Sum-of-Pairs of the predictions comprising the shortest path is an improvement over using the permutation with the shortest path. The fact that the algorithm is based on calculating paths in linear time, and only in the final stage is a quadratic time Sum-of-Pair score calculated, enables the algorithm to scale, in practice, almost linearly with the number of sequences.

Bottom Line: We also show that RNA secondary structures can be compared very rapidly by a simple string Edit-Distance algorithm with a minimal loss of accuracy.These datasets allowed for comparison of the algorithm with other methods.In these tests, RNAspa performed better than four other programs.

View Article: PubMed Central - HTML - PubMed

Affiliation: The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan 52900, Israel. yair@biomodel.os.biu.ac.il

ABSTRACT

Background: In recent years, RNA molecules that are not translated into proteins (ncRNAs) have drawn a great deal of attention, as they were shown to be involved in many cellular functions. One of the most important computational problems regarding ncRNA is to predict the secondary structure of a molecule from its sequence. In particular, we attempted to predict the secondary structure for a set of unaligned ncRNA molecules that are taken from the same family, and thus presumably have a similar structure.

Results: We developed the RNAspa program, which comparatively predicts the secondary structure for a set of ncRNA molecules in linear time in the number of molecules. We observed that in a list of several hundred suboptimal minimal free energy (MFE) predictions, as provided by the RNAsubopt program of the Vienna package, it is likely that at least one suggested structure would be similar to the true, correct one. The suboptimal solutions of each molecule are represented as a layer of vertices in a graph. The shortest path in this graph is the basis for structural predictions for the molecule. We also show that RNA secondary structures can be compared very rapidly by a simple string Edit-Distance algorithm with a minimal loss of accuracy. We show that this approach allows us to more deeply explore the suboptimal structure space.

Conclusion: The algorithm was tested on three datasets which include several ncRNA families taken from the Rfam database. These datasets allowed for comparison of the algorithm with other methods. In these tests, RNAspa performed better than four other programs.

Show MeSH
Related in: MedlinePlus