Limits...
Prediction of MicroRNA Precursors Using Parsimonious Feature Sets.

Stepanowsky P, Levy E, Kim J, Jiang X, Ohno-Machado L - Cancer Inform (2014)

Bottom Line: However, no study has systematically compared published feature sets.We found that classifiers containing as few as seven highly predictive features are able to predict novel precursor miRNAs as well as classifiers that use larger feature sets.In a real data set, our method correctly identified the holdout miRNAs relevant to renal cancer.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Research Group, University of Applied Sciences, Upper Austria, Hagenberg, Austria.

ABSTRACT
MicroRNAs (miRNAs) are a class of short noncoding RNAs that regulate gene expression through base pairing with messenger RNAs. Due to the interest in studying miRNA dysregulation in disease and limits of validated miRNA references, identification of novel miRNAs is a critical task. The performance of different models to predict novel miRNAs varies with the features chosen as predictors. However, no study has systematically compared published feature sets. We constructed a comprehensive feature set using the minimum free energy of the secondary structure of precursor miRNAs, a set of nucleotide-structure triplets, and additional extracted sequence and structure characteristics. We then compared the predictive value of our comprehensive feature set to those from three previously published studies, using logistic regression and random forest classifiers. We found that classifiers containing as few as seven highly predictive features are able to predict novel precursor miRNAs as well as classifiers that use larger feature sets. In a real data set, our method correctly identified the holdout miRNAs relevant to renal cancer.

No MeSH data available.


Related in: MedlinePlus

An overview of the workflow to extract different nucleotide-structure triplets.Notes: Only the nucleotides on the 5’ and 3’ stem arm are considered. The different triplets are counted and then normalized by the corresponding total number of triplets.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4216048&req=5

f2-cin-suppl.1-2014-095: An overview of the workflow to extract different nucleotide-structure triplets.Notes: Only the nucleotides on the 5’ and 3’ stem arm are considered. The different triplets are counted and then normalized by the corresponding total number of triplets.

Mentions: It is important to encode the secondary structure with the sequence information because, as illustrated in Figure 1, the change of only one nucleotide in a pre-miRNA sequence can result in a different secondary structure. Due to this fact, the structure sequence in bracket notation is divided into overlapping triplets, considering each nucleotide in each triplet. For each base, there are 8 (23) possible triplet structures: “…”, “.(”, “.(.”, “.((”, “(.”, “(.(”, “((.” and “(((”. With the left, middle, or right nucleotide of each triplet, there are 96 (4 (bases) × 8 (triplets) × 3 (different nucleotides)) possible nucleotide-structure combinations, which we list as “A…_l”, “A…_m”, “A…_r”, …, “U(((_l”, “U(((_m”, “U(((_r”, where “l,” “m,” and “r” represent left, middle, and right, respectively. We only consider the 5’ and 3’ stem arm to extract the triplets from the secondary structure of a pre-miRNA, as shown in Figure 2. The numbers of different nucleotide-structure triplets are counted and then normalized by the number of the extracted triplets per nucleotide type (left, middle, or right) to generate a 96-dimensional feature vector.


Prediction of MicroRNA Precursors Using Parsimonious Feature Sets.

Stepanowsky P, Levy E, Kim J, Jiang X, Ohno-Machado L - Cancer Inform (2014)

An overview of the workflow to extract different nucleotide-structure triplets.Notes: Only the nucleotides on the 5’ and 3’ stem arm are considered. The different triplets are counted and then normalized by the corresponding total number of triplets.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4216048&req=5

f2-cin-suppl.1-2014-095: An overview of the workflow to extract different nucleotide-structure triplets.Notes: Only the nucleotides on the 5’ and 3’ stem arm are considered. The different triplets are counted and then normalized by the corresponding total number of triplets.
Mentions: It is important to encode the secondary structure with the sequence information because, as illustrated in Figure 1, the change of only one nucleotide in a pre-miRNA sequence can result in a different secondary structure. Due to this fact, the structure sequence in bracket notation is divided into overlapping triplets, considering each nucleotide in each triplet. For each base, there are 8 (23) possible triplet structures: “…”, “.(”, “.(.”, “.((”, “(.”, “(.(”, “((.” and “(((”. With the left, middle, or right nucleotide of each triplet, there are 96 (4 (bases) × 8 (triplets) × 3 (different nucleotides)) possible nucleotide-structure combinations, which we list as “A…_l”, “A…_m”, “A…_r”, …, “U(((_l”, “U(((_m”, “U(((_r”, where “l,” “m,” and “r” represent left, middle, and right, respectively. We only consider the 5’ and 3’ stem arm to extract the triplets from the secondary structure of a pre-miRNA, as shown in Figure 2. The numbers of different nucleotide-structure triplets are counted and then normalized by the number of the extracted triplets per nucleotide type (left, middle, or right) to generate a 96-dimensional feature vector.

Bottom Line: However, no study has systematically compared published feature sets.We found that classifiers containing as few as seven highly predictive features are able to predict novel precursor miRNAs as well as classifiers that use larger feature sets.In a real data set, our method correctly identified the holdout miRNAs relevant to renal cancer.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Research Group, University of Applied Sciences, Upper Austria, Hagenberg, Austria.

ABSTRACT
MicroRNAs (miRNAs) are a class of short noncoding RNAs that regulate gene expression through base pairing with messenger RNAs. Due to the interest in studying miRNA dysregulation in disease and limits of validated miRNA references, identification of novel miRNAs is a critical task. The performance of different models to predict novel miRNAs varies with the features chosen as predictors. However, no study has systematically compared published feature sets. We constructed a comprehensive feature set using the minimum free energy of the secondary structure of precursor miRNAs, a set of nucleotide-structure triplets, and additional extracted sequence and structure characteristics. We then compared the predictive value of our comprehensive feature set to those from three previously published studies, using logistic regression and random forest classifiers. We found that classifiers containing as few as seven highly predictive features are able to predict novel precursor miRNAs as well as classifiers that use larger feature sets. In a real data set, our method correctly identified the holdout miRNAs relevant to renal cancer.

No MeSH data available.


Related in: MedlinePlus