Limits...
Prediction of MicroRNA Precursors Using Parsimonious Feature Sets.

Stepanowsky P, Levy E, Kim J, Jiang X, Ohno-Machado L - Cancer Inform (2014)

Bottom Line: However, no study has systematically compared published feature sets.We found that classifiers containing as few as seven highly predictive features are able to predict novel precursor miRNAs as well as classifiers that use larger feature sets.In a real data set, our method correctly identified the holdout miRNAs relevant to renal cancer.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Research Group, University of Applied Sciences, Upper Austria, Hagenberg, Austria.

ABSTRACT
MicroRNAs (miRNAs) are a class of short noncoding RNAs that regulate gene expression through base pairing with messenger RNAs. Due to the interest in studying miRNA dysregulation in disease and limits of validated miRNA references, identification of novel miRNAs is a critical task. The performance of different models to predict novel miRNAs varies with the features chosen as predictors. However, no study has systematically compared published feature sets. We constructed a comprehensive feature set using the minimum free energy of the secondary structure of precursor miRNAs, a set of nucleotide-structure triplets, and additional extracted sequence and structure characteristics. We then compared the predictive value of our comprehensive feature set to those from three previously published studies, using logistic regression and random forest classifiers. We found that classifiers containing as few as seven highly predictive features are able to predict novel precursor miRNAs as well as classifiers that use larger feature sets. In a real data set, our method correctly identified the holdout miRNAs relevant to renal cancer.

No MeSH data available.


Related in: MedlinePlus

Histogram of estimates (ie, prediction probabilities) for the positive samples from the renal cancer study.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4216048&req=5

f5-cin-suppl.1-2014-095: Histogram of estimates (ie, prediction probabilities) for the positive samples from the renal cancer study.

Mentions: Our feature set and that of Jiang et al.16 had equal sensitivity (0.9) at a threshold of 0.5, while those of Xue et al.15 and Zhao et al.17 had sensitivities of 0.067 and 0.633, respectively, on the holdout true-positive miRNA set from the renal cancer study. In a histogram of prediction probabilities (Fig. 5), our feature set exhibited a skewed distribution at high-valued output estimates, indicating that, when the classifiers that use our feature set output a high score, there is high likelihood that this is a true miRNA. Of interest are a few real miRNAs that had low scores (below 0.5) for all feature sets. None of the feature sets were able to predict hsa-miR-660, hsa-miR-15a, and hsa-miR−532. miR-Base19 stem-loop diagrams show that all three of these miRNAs have noncanonical two-loop structures, which cause all the feature sets to fail.


Prediction of MicroRNA Precursors Using Parsimonious Feature Sets.

Stepanowsky P, Levy E, Kim J, Jiang X, Ohno-Machado L - Cancer Inform (2014)

Histogram of estimates (ie, prediction probabilities) for the positive samples from the renal cancer study.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4216048&req=5

f5-cin-suppl.1-2014-095: Histogram of estimates (ie, prediction probabilities) for the positive samples from the renal cancer study.
Mentions: Our feature set and that of Jiang et al.16 had equal sensitivity (0.9) at a threshold of 0.5, while those of Xue et al.15 and Zhao et al.17 had sensitivities of 0.067 and 0.633, respectively, on the holdout true-positive miRNA set from the renal cancer study. In a histogram of prediction probabilities (Fig. 5), our feature set exhibited a skewed distribution at high-valued output estimates, indicating that, when the classifiers that use our feature set output a high score, there is high likelihood that this is a true miRNA. Of interest are a few real miRNAs that had low scores (below 0.5) for all feature sets. None of the feature sets were able to predict hsa-miR-660, hsa-miR-15a, and hsa-miR−532. miR-Base19 stem-loop diagrams show that all three of these miRNAs have noncanonical two-loop structures, which cause all the feature sets to fail.

Bottom Line: However, no study has systematically compared published feature sets.We found that classifiers containing as few as seven highly predictive features are able to predict novel precursor miRNAs as well as classifiers that use larger feature sets.In a real data set, our method correctly identified the holdout miRNAs relevant to renal cancer.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Research Group, University of Applied Sciences, Upper Austria, Hagenberg, Austria.

ABSTRACT
MicroRNAs (miRNAs) are a class of short noncoding RNAs that regulate gene expression through base pairing with messenger RNAs. Due to the interest in studying miRNA dysregulation in disease and limits of validated miRNA references, identification of novel miRNAs is a critical task. The performance of different models to predict novel miRNAs varies with the features chosen as predictors. However, no study has systematically compared published feature sets. We constructed a comprehensive feature set using the minimum free energy of the secondary structure of precursor miRNAs, a set of nucleotide-structure triplets, and additional extracted sequence and structure characteristics. We then compared the predictive value of our comprehensive feature set to those from three previously published studies, using logistic regression and random forest classifiers. We found that classifiers containing as few as seven highly predictive features are able to predict novel precursor miRNAs as well as classifiers that use larger feature sets. In a real data set, our method correctly identified the holdout miRNAs relevant to renal cancer.

No MeSH data available.


Related in: MedlinePlus