Limits...
Prediction of novel microRNA genes in cancer-associated genomic regions--a combined computational and experimental approach.

Oulas A, Boutla A, Gkirtzou K, Reczko M, Kalantidis K, Poirazi P - Nucleic Acids Res. (2009)

Bottom Line: The trained classifier is used to identify novel miRNA gene candidates located within cancer-associated genomic regions and rank the resulting predictions using expression information from a full genome tiling array.Finally, four of the top scoring predictions are verified experimentally using northern blot analysis.Our work combines both analytical and experimental techniques to show that SSCprofiler is a highly accurate tool which can be used to identify novel miRNA gene candidates in the human genome.

View Article: PubMed Central - PubMed

Affiliation: Institute of Molecular Biology and Biotechnology-FORTH, Heraklion, University of Crete, Heraklion, Greece.

ABSTRACT
The majority of existing computational tools rely on sequence homology and/or structural similarity to identify novel microRNA (miRNA) genes. Recently supervised algorithms are utilized to address this problem, taking into account sequence, structure and comparative genomics information. In most of these studies miRNA gene predictions are rarely supported by experimental evidence and prediction accuracy remains uncertain. In this work we present a new computational tool (SSCprofiler) utilizing a probabilistic method based on Profile Hidden Markov Models to predict novel miRNA precursors. Via the simultaneous integration of biological features such as sequence, structure and conservation, SSCprofiler achieves a performance accuracy of 88.95% sensitivity and 84.16% specificity on a large set of human miRNA genes. The trained classifier is used to identify novel miRNA gene candidates located within cancer-associated genomic regions and rank the resulting predictions using expression information from a full genome tiling array. Finally, four of the top scoring predictions are verified experimentally using northern blot analysis. Our work combines both analytical and experimental techniques to show that SSCprofiler is a highly accurate tool which can be used to identify novel miRNA gene candidates in the human genome. SSCprofiler is freely available as a web service at http://www.imbb.forth.gr/SSCprofiler.html.

Show MeSH
ROC curves for all possible combinations of sequence (Se), structure (St) and conservation (Co) features for the validation set averaged, over 100 repetitions. As evident from the figure, the area under the curve is maximized when all three features are combined. Note that conservation alone significantly outperforms sequence, structure and sequence + structure (SeSt).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2691815&req=5

Figure 5: ROC curves for all possible combinations of sequence (Se), structure (St) and conservation (Co) features for the validation set averaged, over 100 repetitions. As evident from the figure, the area under the curve is maximized when all three features are combined. Note that conservation alone significantly outperforms sequence, structure and sequence + structure (SeSt).

Mentions: The above-mentioned filtering procedure resulted in 249 true miRNAs and 2330 negative sequences. Subsequently, HMMs were trained solely on the true miRNAs using a 5-fold (three-fifths for training, two-fifths for validation) boosting validation procedure (as described in the ‘Materials and Methods’ section). The procedure was repeated for different combinations of biological features and the HMMs average performance accuracy was reported for each case. ROC curves showing the average validation performance of HMMs that utilize all possible combinations of sequence, structure and conservation information are shown in Figure 5. There was a significant improvement in prediction accuracy for the validation set when certain features were combined, highlighting the importance of simultaneously incorporating additional biological information during the training procedure. The best results were obtained when all three features were used to train the HMMs, achieving on average 88.95% sensitivity and 84.16% specificity in the validation set for a score threshold of 3 (Figure 5 and Supplementary Figure S1). Once a good performance on the training/validation sets was achieved, all 249 true miRNA precursors were pooled together and used to train the final HMM taking into account the same feature combination and filtering parameters. This final model was used to build the scanning interface of the SSCprofiler.Figure 5.


Prediction of novel microRNA genes in cancer-associated genomic regions--a combined computational and experimental approach.

Oulas A, Boutla A, Gkirtzou K, Reczko M, Kalantidis K, Poirazi P - Nucleic Acids Res. (2009)

ROC curves for all possible combinations of sequence (Se), structure (St) and conservation (Co) features for the validation set averaged, over 100 repetitions. As evident from the figure, the area under the curve is maximized when all three features are combined. Note that conservation alone significantly outperforms sequence, structure and sequence + structure (SeSt).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2691815&req=5

Figure 5: ROC curves for all possible combinations of sequence (Se), structure (St) and conservation (Co) features for the validation set averaged, over 100 repetitions. As evident from the figure, the area under the curve is maximized when all three features are combined. Note that conservation alone significantly outperforms sequence, structure and sequence + structure (SeSt).
Mentions: The above-mentioned filtering procedure resulted in 249 true miRNAs and 2330 negative sequences. Subsequently, HMMs were trained solely on the true miRNAs using a 5-fold (three-fifths for training, two-fifths for validation) boosting validation procedure (as described in the ‘Materials and Methods’ section). The procedure was repeated for different combinations of biological features and the HMMs average performance accuracy was reported for each case. ROC curves showing the average validation performance of HMMs that utilize all possible combinations of sequence, structure and conservation information are shown in Figure 5. There was a significant improvement in prediction accuracy for the validation set when certain features were combined, highlighting the importance of simultaneously incorporating additional biological information during the training procedure. The best results were obtained when all three features were used to train the HMMs, achieving on average 88.95% sensitivity and 84.16% specificity in the validation set for a score threshold of 3 (Figure 5 and Supplementary Figure S1). Once a good performance on the training/validation sets was achieved, all 249 true miRNA precursors were pooled together and used to train the final HMM taking into account the same feature combination and filtering parameters. This final model was used to build the scanning interface of the SSCprofiler.Figure 5.

Bottom Line: The trained classifier is used to identify novel miRNA gene candidates located within cancer-associated genomic regions and rank the resulting predictions using expression information from a full genome tiling array.Finally, four of the top scoring predictions are verified experimentally using northern blot analysis.Our work combines both analytical and experimental techniques to show that SSCprofiler is a highly accurate tool which can be used to identify novel miRNA gene candidates in the human genome.

View Article: PubMed Central - PubMed

Affiliation: Institute of Molecular Biology and Biotechnology-FORTH, Heraklion, University of Crete, Heraklion, Greece.

ABSTRACT
The majority of existing computational tools rely on sequence homology and/or structural similarity to identify novel microRNA (miRNA) genes. Recently supervised algorithms are utilized to address this problem, taking into account sequence, structure and comparative genomics information. In most of these studies miRNA gene predictions are rarely supported by experimental evidence and prediction accuracy remains uncertain. In this work we present a new computational tool (SSCprofiler) utilizing a probabilistic method based on Profile Hidden Markov Models to predict novel miRNA precursors. Via the simultaneous integration of biological features such as sequence, structure and conservation, SSCprofiler achieves a performance accuracy of 88.95% sensitivity and 84.16% specificity on a large set of human miRNA genes. The trained classifier is used to identify novel miRNA gene candidates located within cancer-associated genomic regions and rank the resulting predictions using expression information from a full genome tiling array. Finally, four of the top scoring predictions are verified experimentally using northern blot analysis. Our work combines both analytical and experimental techniques to show that SSCprofiler is a highly accurate tool which can be used to identify novel miRNA gene candidates in the human genome. SSCprofiler is freely available as a web service at http://www.imbb.forth.gr/SSCprofiler.html.

Show MeSH