Limits...
Robust and accurate prediction of noncoding RNAs from aligned sequences.

Saito Y, Sato K, Sakakibara Y - BMC Bioinformatics (2010)

Bottom Line: However, this strategy involves the possibility that the quality of input alignments can influence the performance of prediction methods.Therefore, the evaluation of the robustness against alignment errors is necessary as well as the development of accurate prediction methods.It is more accurate than the existing prediction methods, and can keep its performance under the practical situations in which the quality of input alignments is not necessarily high.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa 223-8522, Japan.

ABSTRACT

Background: Computational prediction of noncoding RNAs (ncRNAs) is an important task in the post-genomic era. One common approach is to utilize the profile information contained in alignment data rather than single sequences. However, this strategy involves the possibility that the quality of input alignments can influence the performance of prediction methods. Therefore, the evaluation of the robustness against alignment errors is necessary as well as the development of accurate prediction methods.

Results: We describe a new method, called Profile BPLA kernel, which predicts ncRNAs from alignment data in combination with support vector machines (SVMs). Profile BPLA kernel is an extension of base-pairing profile local alignment (BPLA) kernel which we previously developed for the prediction from single sequences. By utilizing the profile information of alignment data, the proposed kernel can achieve better accuracy than the original BPLA kernel. We show that Profile BPLA kernel outperforms the existing prediction methods which also utilize the profile information using the high-quality structural alignment dataset. In addition to these standard benchmark tests, we extensively evaluate the robustness of Profile BPLA kernel against errors in input alignments. We consider two different types of error: first, that all sequences in an alignment are actually ncRNAs but are aligned ignoring their secondary structures; second, that an alignment contains unrelated sequences which are not ncRNAs but still aligned. In both cases, the effects on the performance of Profile BPLA kernel are surprisingly small. Especially for the latter case, we demonstrate that Profile BPLA kernel is more robust compared to the existing prediction methods.

Conclusions: Profile BPLA kernel provides a promising way for identifying ncRNAs from alignment data. It is more accurate than the existing prediction methods, and can keep its performance under the practical situations in which the quality of input alignments is not necessarily high.

Show MeSH

Related in: MedlinePlus

Accuracy on the sequence-based alignment dataset with different fractions of unrelated sequences. For each point, the alignments with the different fraction of unrelated sequences were discriminated from the negative samples which consist only of unrelated sequences. Zero in the horizontal axis corresponds to the detection of the alignments which consist only of actual ncRNAs, i.e., an ordinary discrimination problem without the Type B errors. The error bars show standard deviation of ROC scores.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2957686&req=5

Figure 1: Accuracy on the sequence-based alignment dataset with different fractions of unrelated sequences. For each point, the alignments with the different fraction of unrelated sequences were discriminated from the negative samples which consist only of unrelated sequences. Zero in the horizontal axis corresponds to the detection of the alignments which consist only of actual ncRNAs, i.e., an ordinary discrimination problem without the Type B errors. The error bars show standard deviation of ROC scores.

Mentions: The experimental results are shown in Figure 1. In this figure, zero in the horizontal axis is equivalent to an ordinary prediction problem in which alignments to be discriminated from negative samples do not contain any unrelated sequences. In this situation, Profile BPLA kernel achieved the best accuracy on average, being consistent with the results in Table 4. (The ROC scores, however, were not exactly the same as those in Table 4 since we used the different kind of negative samples in the test data between the two experiments: alignments consisting only of unrelated sequences for Figure 1, and dinucleotide-controlled samples for Table 4.) As the fraction of unrelated sequences increased, the ROC scores for RNAz rapidly fell down to the baseline. In contrast, Profile BPLA kernel kept the discrimination at high levels until the alignments were overwhelmed by the Type B errors. A similar tendency was seen for Profile stem kernel, although its ROC scores were smaller than Profile BPLA kernel. The performance of Profile LA kernel was seriously damaged by the Type B errors since the method does not consider secondary structures of unrelated sequences. These results suggest that Profile BPLA kernel is the only method which can effectively detect ncRNAs in the presence of the Type B errors.


Robust and accurate prediction of noncoding RNAs from aligned sequences.

Saito Y, Sato K, Sakakibara Y - BMC Bioinformatics (2010)

Accuracy on the sequence-based alignment dataset with different fractions of unrelated sequences. For each point, the alignments with the different fraction of unrelated sequences were discriminated from the negative samples which consist only of unrelated sequences. Zero in the horizontal axis corresponds to the detection of the alignments which consist only of actual ncRNAs, i.e., an ordinary discrimination problem without the Type B errors. The error bars show standard deviation of ROC scores.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2957686&req=5

Figure 1: Accuracy on the sequence-based alignment dataset with different fractions of unrelated sequences. For each point, the alignments with the different fraction of unrelated sequences were discriminated from the negative samples which consist only of unrelated sequences. Zero in the horizontal axis corresponds to the detection of the alignments which consist only of actual ncRNAs, i.e., an ordinary discrimination problem without the Type B errors. The error bars show standard deviation of ROC scores.
Mentions: The experimental results are shown in Figure 1. In this figure, zero in the horizontal axis is equivalent to an ordinary prediction problem in which alignments to be discriminated from negative samples do not contain any unrelated sequences. In this situation, Profile BPLA kernel achieved the best accuracy on average, being consistent with the results in Table 4. (The ROC scores, however, were not exactly the same as those in Table 4 since we used the different kind of negative samples in the test data between the two experiments: alignments consisting only of unrelated sequences for Figure 1, and dinucleotide-controlled samples for Table 4.) As the fraction of unrelated sequences increased, the ROC scores for RNAz rapidly fell down to the baseline. In contrast, Profile BPLA kernel kept the discrimination at high levels until the alignments were overwhelmed by the Type B errors. A similar tendency was seen for Profile stem kernel, although its ROC scores were smaller than Profile BPLA kernel. The performance of Profile LA kernel was seriously damaged by the Type B errors since the method does not consider secondary structures of unrelated sequences. These results suggest that Profile BPLA kernel is the only method which can effectively detect ncRNAs in the presence of the Type B errors.

Bottom Line: However, this strategy involves the possibility that the quality of input alignments can influence the performance of prediction methods.Therefore, the evaluation of the robustness against alignment errors is necessary as well as the development of accurate prediction methods.It is more accurate than the existing prediction methods, and can keep its performance under the practical situations in which the quality of input alignments is not necessarily high.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa 223-8522, Japan.

ABSTRACT

Background: Computational prediction of noncoding RNAs (ncRNAs) is an important task in the post-genomic era. One common approach is to utilize the profile information contained in alignment data rather than single sequences. However, this strategy involves the possibility that the quality of input alignments can influence the performance of prediction methods. Therefore, the evaluation of the robustness against alignment errors is necessary as well as the development of accurate prediction methods.

Results: We describe a new method, called Profile BPLA kernel, which predicts ncRNAs from alignment data in combination with support vector machines (SVMs). Profile BPLA kernel is an extension of base-pairing profile local alignment (BPLA) kernel which we previously developed for the prediction from single sequences. By utilizing the profile information of alignment data, the proposed kernel can achieve better accuracy than the original BPLA kernel. We show that Profile BPLA kernel outperforms the existing prediction methods which also utilize the profile information using the high-quality structural alignment dataset. In addition to these standard benchmark tests, we extensively evaluate the robustness of Profile BPLA kernel against errors in input alignments. We consider two different types of error: first, that all sequences in an alignment are actually ncRNAs but are aligned ignoring their secondary structures; second, that an alignment contains unrelated sequences which are not ncRNAs but still aligned. In both cases, the effects on the performance of Profile BPLA kernel are surprisingly small. Especially for the latter case, we demonstrate that Profile BPLA kernel is more robust compared to the existing prediction methods.

Conclusions: Profile BPLA kernel provides a promising way for identifying ncRNAs from alignment data. It is more accurate than the existing prediction methods, and can keep its performance under the practical situations in which the quality of input alignments is not necessarily high.

Show MeSH
Related in: MedlinePlus