Limits...
RNAspa: a shortest path approach for comparative prediction of the secondary structure of ncRNA molecules.

Horesh Y, Doniger T, Michaeli S, Unger R - BMC Bioinformatics (2007)

Bottom Line: We also show that RNA secondary structures can be compared very rapidly by a simple string Edit-Distance algorithm with a minimal loss of accuracy.These datasets allowed for comparison of the algorithm with other methods.In these tests, RNAspa performed better than four other programs.

View Article: PubMed Central - HTML - PubMed

Affiliation: The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan 52900, Israel. yair@biomodel.os.biu.ac.il

ABSTRACT

Background: In recent years, RNA molecules that are not translated into proteins (ncRNAs) have drawn a great deal of attention, as they were shown to be involved in many cellular functions. One of the most important computational problems regarding ncRNA is to predict the secondary structure of a molecule from its sequence. In particular, we attempted to predict the secondary structure for a set of unaligned ncRNA molecules that are taken from the same family, and thus presumably have a similar structure.

Results: We developed the RNAspa program, which comparatively predicts the secondary structure for a set of ncRNA molecules in linear time in the number of molecules. We observed that in a list of several hundred suboptimal minimal free energy (MFE) predictions, as provided by the RNAsubopt program of the Vienna package, it is likely that at least one suggested structure would be similar to the true, correct one. The suboptimal solutions of each molecule are represented as a layer of vertices in a graph. The shortest path in this graph is the basis for structural predictions for the molecule. We also show that RNA secondary structures can be compared very rapidly by a simple string Edit-Distance algorithm with a minimal loss of accuracy. We show that this approach allows us to more deeply explore the suboptimal structure space.

Conclusion: The algorithm was tested on three datasets which include several ncRNA families taken from the Rfam database. These datasets allowed for comparison of the algorithm with other methods. In these tests, RNAspa performed better than four other programs.

Show MeSH

Related in: MedlinePlus

Average ranking of the best/worst structure prediction. The average MFE rank of the predicted structure with the best and worst MCC scores. On average, the rank of the best structure (orange) is better (lower) than that of the worst structure (cyan). However, the standard deviation of these averages is very large.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2147038&req=5

Figure 1: Average ranking of the best/worst structure prediction. The average MFE rank of the predicted structure with the best and worst MCC scores. On average, the rank of the best structure (orange) is better (lower) than that of the worst structure (cyan). However, the standard deviation of these averages is very large.

Mentions: We present here a polynomial time algorithm and its implementation, which performs well for ncRNA structure prediction. The algorithm builds on the ability of the Vienna Package's RNAsubopt to include at least one structure close to the 'true' structure in the set of suboptimal solutions that it suggests. Thus, it is important to first evaluate the improvement that our method offers compared with directly using the results of RNAsubopt. First we analyzed, for our dataset, the accuracy of the predictions of RNAsubopt. RNAsubopt has two modes of operation: 'complete enumeration' ('-e range' option) that enumerates all structures within a predefined range of the MFE structure and 'Boltzmann sampling' ('-p n' option) that samples the space of suboptimal structures following a Boltzmann distribution of their MFE. Unless mentioned otherwise, RNAsubopt was run in the 'complete enumeration' mode in the experiments described below. A comparison between these two modes is shown in Table 2. The 'complete enumeration' mode performed better on shorter sequences and the 'Boltzmann sampling' mode performed better on longer sequences. This can be explained by the nature of these two modes. The two modes trade density for diversity. Within a given energy range in the complete enumeration mode, shorter sequences yield a diverse set of suboptimal structures, but as the length increases an increasing number of suboptimal structures with only minor differences are suggested and therefore sampling becomes more important. We ranked the 150 structures predicted by RNAsubopt for each sequence in two ways: by their MCC score (as compared to the Rfam standard), and by their MFE (as assigned by RNAsubopt). Then, we looked at the structure with the best/worst MCC score and found its position in the MFE ranked list. Averaged results for each dataset are shown in Figure 1. The results show that, on average, the structures with the best MCC scores tend to originate from predictions with better MFEs than those with worse MCC scores. However, these values have very large standard deviations, and thus the correlation is not reliable. Note further that among all the datasets the average MFE-based ranking for the structure with the best MCC score was equal to or greater than 30. Thus, the MFE alone cannot identify the 'correct' formations, and we resorted to a comparative approach.


RNAspa: a shortest path approach for comparative prediction of the secondary structure of ncRNA molecules.

Horesh Y, Doniger T, Michaeli S, Unger R - BMC Bioinformatics (2007)

Average ranking of the best/worst structure prediction. The average MFE rank of the predicted structure with the best and worst MCC scores. On average, the rank of the best structure (orange) is better (lower) than that of the worst structure (cyan). However, the standard deviation of these averages is very large.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2147038&req=5

Figure 1: Average ranking of the best/worst structure prediction. The average MFE rank of the predicted structure with the best and worst MCC scores. On average, the rank of the best structure (orange) is better (lower) than that of the worst structure (cyan). However, the standard deviation of these averages is very large.
Mentions: We present here a polynomial time algorithm and its implementation, which performs well for ncRNA structure prediction. The algorithm builds on the ability of the Vienna Package's RNAsubopt to include at least one structure close to the 'true' structure in the set of suboptimal solutions that it suggests. Thus, it is important to first evaluate the improvement that our method offers compared with directly using the results of RNAsubopt. First we analyzed, for our dataset, the accuracy of the predictions of RNAsubopt. RNAsubopt has two modes of operation: 'complete enumeration' ('-e range' option) that enumerates all structures within a predefined range of the MFE structure and 'Boltzmann sampling' ('-p n' option) that samples the space of suboptimal structures following a Boltzmann distribution of their MFE. Unless mentioned otherwise, RNAsubopt was run in the 'complete enumeration' mode in the experiments described below. A comparison between these two modes is shown in Table 2. The 'complete enumeration' mode performed better on shorter sequences and the 'Boltzmann sampling' mode performed better on longer sequences. This can be explained by the nature of these two modes. The two modes trade density for diversity. Within a given energy range in the complete enumeration mode, shorter sequences yield a diverse set of suboptimal structures, but as the length increases an increasing number of suboptimal structures with only minor differences are suggested and therefore sampling becomes more important. We ranked the 150 structures predicted by RNAsubopt for each sequence in two ways: by their MCC score (as compared to the Rfam standard), and by their MFE (as assigned by RNAsubopt). Then, we looked at the structure with the best/worst MCC score and found its position in the MFE ranked list. Averaged results for each dataset are shown in Figure 1. The results show that, on average, the structures with the best MCC scores tend to originate from predictions with better MFEs than those with worse MCC scores. However, these values have very large standard deviations, and thus the correlation is not reliable. Note further that among all the datasets the average MFE-based ranking for the structure with the best MCC score was equal to or greater than 30. Thus, the MFE alone cannot identify the 'correct' formations, and we resorted to a comparative approach.

Bottom Line: We also show that RNA secondary structures can be compared very rapidly by a simple string Edit-Distance algorithm with a minimal loss of accuracy.These datasets allowed for comparison of the algorithm with other methods.In these tests, RNAspa performed better than four other programs.

View Article: PubMed Central - HTML - PubMed

Affiliation: The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan 52900, Israel. yair@biomodel.os.biu.ac.il

ABSTRACT

Background: In recent years, RNA molecules that are not translated into proteins (ncRNAs) have drawn a great deal of attention, as they were shown to be involved in many cellular functions. One of the most important computational problems regarding ncRNA is to predict the secondary structure of a molecule from its sequence. In particular, we attempted to predict the secondary structure for a set of unaligned ncRNA molecules that are taken from the same family, and thus presumably have a similar structure.

Results: We developed the RNAspa program, which comparatively predicts the secondary structure for a set of ncRNA molecules in linear time in the number of molecules. We observed that in a list of several hundred suboptimal minimal free energy (MFE) predictions, as provided by the RNAsubopt program of the Vienna package, it is likely that at least one suggested structure would be similar to the true, correct one. The suboptimal solutions of each molecule are represented as a layer of vertices in a graph. The shortest path in this graph is the basis for structural predictions for the molecule. We also show that RNA secondary structures can be compared very rapidly by a simple string Edit-Distance algorithm with a minimal loss of accuracy. We show that this approach allows us to more deeply explore the suboptimal structure space.

Conclusion: The algorithm was tested on three datasets which include several ncRNA families taken from the Rfam database. These datasets allowed for comparison of the algorithm with other methods. In these tests, RNAspa performed better than four other programs.

Show MeSH
Related in: MedlinePlus