Limits...
Prediction of MicroRNA Precursors Using Parsimonious Feature Sets.

Stepanowsky P, Levy E, Kim J, Jiang X, Ohno-Machado L - Cancer Inform (2014)

Bottom Line: However, no study has systematically compared published feature sets.We found that classifiers containing as few as seven highly predictive features are able to predict novel precursor miRNAs as well as classifiers that use larger feature sets.In a real data set, our method correctly identified the holdout miRNAs relevant to renal cancer.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Research Group, University of Applied Sciences, Upper Austria, Hagenberg, Austria.

ABSTRACT
MicroRNAs (miRNAs) are a class of short noncoding RNAs that regulate gene expression through base pairing with messenger RNAs. Due to the interest in studying miRNA dysregulation in disease and limits of validated miRNA references, identification of novel miRNAs is a critical task. The performance of different models to predict novel miRNAs varies with the features chosen as predictors. However, no study has systematically compared published feature sets. We constructed a comprehensive feature set using the minimum free energy of the secondary structure of precursor miRNAs, a set of nucleotide-structure triplets, and additional extracted sequence and structure characteristics. We then compared the predictive value of our comprehensive feature set to those from three previously published studies, using logistic regression and random forest classifiers. We found that classifiers containing as few as seven highly predictive features are able to predict novel precursor miRNAs as well as classifiers that use larger feature sets. In a real data set, our method correctly identified the holdout miRNAs relevant to renal cancer.

No MeSH data available.


Related in: MedlinePlus

Feature contribution to outcome prediction according to the RELIEF score.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4216048&req=5

f3-cin-suppl.1-2014-095: Feature contribution to outcome prediction according to the RELIEF score.

Mentions: Figure 3 shows RELIEF scores for selected features. We selected the top 30 features among the initial 115 using RELIEF scores. The number 30, chosen as the RELIEF curve, had a kink at 30 and this made our feature set size similar to those of comparing methods. We used 10-fold cross-validation to estimate the performance of the trained model. Consistent cross-validation performance shows a generalizability and lack of over-fitting with the model. We applied logistic regression (LR) and RF models to compare the performance of our feature set with other previously published feature sets. For a fair and comprehensive comparison, we used optimal parameters within each machine-learning method for each feature set (Table 2). Based on the slope change pattern in the RELIEF curve, we then selected the top 7 features to examine the performance of a “minimal” classifier as compared to more comprehensive feature sets.


Prediction of MicroRNA Precursors Using Parsimonious Feature Sets.

Stepanowsky P, Levy E, Kim J, Jiang X, Ohno-Machado L - Cancer Inform (2014)

Feature contribution to outcome prediction according to the RELIEF score.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4216048&req=5

f3-cin-suppl.1-2014-095: Feature contribution to outcome prediction according to the RELIEF score.
Mentions: Figure 3 shows RELIEF scores for selected features. We selected the top 30 features among the initial 115 using RELIEF scores. The number 30, chosen as the RELIEF curve, had a kink at 30 and this made our feature set size similar to those of comparing methods. We used 10-fold cross-validation to estimate the performance of the trained model. Consistent cross-validation performance shows a generalizability and lack of over-fitting with the model. We applied logistic regression (LR) and RF models to compare the performance of our feature set with other previously published feature sets. For a fair and comprehensive comparison, we used optimal parameters within each machine-learning method for each feature set (Table 2). Based on the slope change pattern in the RELIEF curve, we then selected the top 7 features to examine the performance of a “minimal” classifier as compared to more comprehensive feature sets.

Bottom Line: However, no study has systematically compared published feature sets.We found that classifiers containing as few as seven highly predictive features are able to predict novel precursor miRNAs as well as classifiers that use larger feature sets.In a real data set, our method correctly identified the holdout miRNAs relevant to renal cancer.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Research Group, University of Applied Sciences, Upper Austria, Hagenberg, Austria.

ABSTRACT
MicroRNAs (miRNAs) are a class of short noncoding RNAs that regulate gene expression through base pairing with messenger RNAs. Due to the interest in studying miRNA dysregulation in disease and limits of validated miRNA references, identification of novel miRNAs is a critical task. The performance of different models to predict novel miRNAs varies with the features chosen as predictors. However, no study has systematically compared published feature sets. We constructed a comprehensive feature set using the minimum free energy of the secondary structure of precursor miRNAs, a set of nucleotide-structure triplets, and additional extracted sequence and structure characteristics. We then compared the predictive value of our comprehensive feature set to those from three previously published studies, using logistic regression and random forest classifiers. We found that classifiers containing as few as seven highly predictive features are able to predict novel precursor miRNAs as well as classifiers that use larger feature sets. In a real data set, our method correctly identified the holdout miRNAs relevant to renal cancer.

No MeSH data available.


Related in: MedlinePlus