Limits...
Prediction of novel microRNA genes in cancer-associated genomic regions--a combined computational and experimental approach.

Oulas A, Boutla A, Gkirtzou K, Reczko M, Kalantidis K, Poirazi P - Nucleic Acids Res. (2009)

Bottom Line: The trained classifier is used to identify novel miRNA gene candidates located within cancer-associated genomic regions and rank the resulting predictions using expression information from a full genome tiling array.Finally, four of the top scoring predictions are verified experimentally using northern blot analysis.Our work combines both analytical and experimental techniques to show that SSCprofiler is a highly accurate tool which can be used to identify novel miRNA gene candidates in the human genome.

View Article: PubMed Central - PubMed

Affiliation: Institute of Molecular Biology and Biotechnology-FORTH, Heraklion, University of Crete, Heraklion, Greece.

ABSTRACT
The majority of existing computational tools rely on sequence homology and/or structural similarity to identify novel microRNA (miRNA) genes. Recently supervised algorithms are utilized to address this problem, taking into account sequence, structure and comparative genomics information. In most of these studies miRNA gene predictions are rarely supported by experimental evidence and prediction accuracy remains uncertain. In this work we present a new computational tool (SSCprofiler) utilizing a probabilistic method based on Profile Hidden Markov Models to predict novel miRNA precursors. Via the simultaneous integration of biological features such as sequence, structure and conservation, SSCprofiler achieves a performance accuracy of 88.95% sensitivity and 84.16% specificity on a large set of human miRNA genes. The trained classifier is used to identify novel miRNA gene candidates located within cancer-associated genomic regions and rank the resulting predictions using expression information from a full genome tiling array. Finally, four of the top scoring predictions are verified experimentally using northern blot analysis. Our work combines both analytical and experimental techniques to show that SSCprofiler is a highly accurate tool which can be used to identify novel miRNA gene candidates in the human genome. SSCprofiler is freely available as a web service at http://www.imbb.forth.gr/SSCprofiler.html.

Show MeSH
Flowchart of the scanning procedure.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2691815&req=5

Figure 3: Flowchart of the scanning procedure.

Mentions: The process of scanning genomic regions for miRNA precursor profiles involves six steps, illustrated in Figure 3. Step 1: A sliding window of selected length is passed along the genomic sequence shifting 1 nt at a time. Step 2: For every window shift, sequence structure and conservation information is retrieved according to the selected training features; i.e. structure prediction is performed and conservation is obtained from the multiz files. Step 3: Each sequence within the sliding window is passed through the filters utilizing the pre-defined filtering parameters (i.e. hairpin length, asymmetry). Step 4: For each sequence, the features used during training (sequence, structure and/or conservation) are generated according to the 16-letter key described earlier. This allows the simultaneous consideration of information for every nucleotide position in the genomic sequence. Step 5: The trained HMM is used to assign a likelihood score to each genomic sequence within the sliding window. The HMM score threshold can be selected by the user. It is usually defined as the score where sensitivity and specificity from the training/validation process were optimal. Step 6: Candidates that overlap by ≤50 nt were grouped and the candidate with the highest score is used to represent the cluster. Thereafter, the candidates are assessed according to their expression in HeLa or HepG2 cells using tiling array data.Figure 3.


Prediction of novel microRNA genes in cancer-associated genomic regions--a combined computational and experimental approach.

Oulas A, Boutla A, Gkirtzou K, Reczko M, Kalantidis K, Poirazi P - Nucleic Acids Res. (2009)

Flowchart of the scanning procedure.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2691815&req=5

Figure 3: Flowchart of the scanning procedure.
Mentions: The process of scanning genomic regions for miRNA precursor profiles involves six steps, illustrated in Figure 3. Step 1: A sliding window of selected length is passed along the genomic sequence shifting 1 nt at a time. Step 2: For every window shift, sequence structure and conservation information is retrieved according to the selected training features; i.e. structure prediction is performed and conservation is obtained from the multiz files. Step 3: Each sequence within the sliding window is passed through the filters utilizing the pre-defined filtering parameters (i.e. hairpin length, asymmetry). Step 4: For each sequence, the features used during training (sequence, structure and/or conservation) are generated according to the 16-letter key described earlier. This allows the simultaneous consideration of information for every nucleotide position in the genomic sequence. Step 5: The trained HMM is used to assign a likelihood score to each genomic sequence within the sliding window. The HMM score threshold can be selected by the user. It is usually defined as the score where sensitivity and specificity from the training/validation process were optimal. Step 6: Candidates that overlap by ≤50 nt were grouped and the candidate with the highest score is used to represent the cluster. Thereafter, the candidates are assessed according to their expression in HeLa or HepG2 cells using tiling array data.Figure 3.

Bottom Line: The trained classifier is used to identify novel miRNA gene candidates located within cancer-associated genomic regions and rank the resulting predictions using expression information from a full genome tiling array.Finally, four of the top scoring predictions are verified experimentally using northern blot analysis.Our work combines both analytical and experimental techniques to show that SSCprofiler is a highly accurate tool which can be used to identify novel miRNA gene candidates in the human genome.

View Article: PubMed Central - PubMed

Affiliation: Institute of Molecular Biology and Biotechnology-FORTH, Heraklion, University of Crete, Heraklion, Greece.

ABSTRACT
The majority of existing computational tools rely on sequence homology and/or structural similarity to identify novel microRNA (miRNA) genes. Recently supervised algorithms are utilized to address this problem, taking into account sequence, structure and comparative genomics information. In most of these studies miRNA gene predictions are rarely supported by experimental evidence and prediction accuracy remains uncertain. In this work we present a new computational tool (SSCprofiler) utilizing a probabilistic method based on Profile Hidden Markov Models to predict novel miRNA precursors. Via the simultaneous integration of biological features such as sequence, structure and conservation, SSCprofiler achieves a performance accuracy of 88.95% sensitivity and 84.16% specificity on a large set of human miRNA genes. The trained classifier is used to identify novel miRNA gene candidates located within cancer-associated genomic regions and rank the resulting predictions using expression information from a full genome tiling array. Finally, four of the top scoring predictions are verified experimentally using northern blot analysis. Our work combines both analytical and experimental techniques to show that SSCprofiler is a highly accurate tool which can be used to identify novel miRNA gene candidates in the human genome. SSCprofiler is freely available as a web service at http://www.imbb.forth.gr/SSCprofiler.html.

Show MeSH