Limits...
Genome-wide enhancer prediction from epigenetic signatures using genetic algorithm-optimized support vector machines.

Fernández M, Miranda-Saavedra D - Nucleic Acids Res. (2012)

Bottom Line: In an independent test, ChromaGenSVM recovered 88% of the experimentally supported enhancers in the pilot ENCODE region of interferon gamma-treated HeLa cells.Furthermore, ChromaGenSVM successfully combined the profiles of only five distinct methylation and acetylation marks from ChIP-seq libraries done in human CD4(+) T cells to predict ∼21,000 experimentally supported enhancers within 1.0 kb regions and with a precision of ∼90%, thereby improving previous predictions on the same dataset by 21%.The combined results indicate that ChromaGenSVM comfortably outperforms previously published methods and that enhancers are best predicted by specific combinations of histone methylation and acetylation marks.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics and Genomics Laboratory, WPI-Immunology Frontier Research Center (IFReC), Osaka University, 3-1 Yamadaoka, Suita 565-0871, Osaka, Japan.

ABSTRACT
The chemical modification of histones at specific DNA regulatory elements is linked to the activation, inactivation and poising of genes. A number of tools exist to predict enhancers from chromatin modification maps, but their practical application is limited because they either (i) consider a smaller number of marks than those necessary to define the various enhancer classes or (ii) work with an excessive number of marks, which is experimentally unviable. We have developed a method for chromatin state detection using support vector machines in combination with genetic algorithm optimization, called ChromaGenSVM. ChromaGenSVM selects optimum combinations of specific histone epigenetic marks to predict enhancers. In an independent test, ChromaGenSVM recovered 88% of the experimentally supported enhancers in the pilot ENCODE region of interferon gamma-treated HeLa cells. Furthermore, ChromaGenSVM successfully combined the profiles of only five distinct methylation and acetylation marks from ChIP-seq libraries done in human CD4(+) T cells to predict ∼21,000 experimentally supported enhancers within 1.0 kb regions and with a precision of ∼90%, thereby improving previous predictions on the same dataset by 21%. The combined results indicate that ChromaGenSVM comfortably outperforms previously published methods and that enhancers are best predicted by specific combinations of histone methylation and acetylation marks.

Show MeSH

Related in: MedlinePlus

Plots of total predictions versus supported predictions in human CD4+ T cells using the histone modification maps of H3K4Me1, H3K4Me3, H3R2Me2, H4K8Ac and H2BK5Ac. The dashed line represents an ideal predictor. (A) Experimental evidences of functional regions: square (p300), triangle (DHS) and cross (any experimental); (B) Computational evidences of functional regions: square (PReMod), triangle (PhastCons) and cross (any computational).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3378905&req=5

gks149-F8: Plots of total predictions versus supported predictions in human CD4+ T cells using the histone modification maps of H3K4Me1, H3K4Me3, H3R2Me2, H4K8Ac and H2BK5Ac. The dashed line represents an ideal predictor. (A) Experimental evidences of functional regions: square (p300), triangle (DHS) and cross (any experimental); (B) Computational evidences of functional regions: square (PReMod), triangle (PhastCons) and cross (any computational).

Mentions: Using this optimum combination of five epigenetic marks, we predicted enhancers genome-wide in human CD4+ T cells. Epigenetic signals were scanned using 1.0 kb windows with a resolution of 400 bp, retaining only the highest scoring element within every 1.0 kb region. Here we report 23 574 predicted enhancers in human CD4+ T cells (Supplementary Table S4). The quality of the predictions in CD4+ T cells was evaluated by counting the overlap with 72 646 DHS regions and 3989 p300 binding sites. The clustering of TFBS and the evolutionary conservation of the predicted regions were also analyzed. Figure 8 shows the plots of the total number of predictions versus supported predictions in a range from 5000 to 24 000 predicted enhancers. Substantial overlaps with DHS regions and p300 binding sites (in the ranges 88–91% and 28–34%) were observed for different subsets of predicted enhancers. Taken these data together, 88–91% of the predictions were supported by at least one of the experimental lines of evidence. Moreover, 10–12% of the predicted regions were found to be evolutionarily conserved according to the PhastCons scores (44), and TFBS clusters were detected in a similar fraction of these regions according to the PReMod database (45). About 19% of the predictions were at least supported by one of the computational lines of evidence.Figure 8.


Genome-wide enhancer prediction from epigenetic signatures using genetic algorithm-optimized support vector machines.

Fernández M, Miranda-Saavedra D - Nucleic Acids Res. (2012)

Plots of total predictions versus supported predictions in human CD4+ T cells using the histone modification maps of H3K4Me1, H3K4Me3, H3R2Me2, H4K8Ac and H2BK5Ac. The dashed line represents an ideal predictor. (A) Experimental evidences of functional regions: square (p300), triangle (DHS) and cross (any experimental); (B) Computational evidences of functional regions: square (PReMod), triangle (PhastCons) and cross (any computational).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3378905&req=5

gks149-F8: Plots of total predictions versus supported predictions in human CD4+ T cells using the histone modification maps of H3K4Me1, H3K4Me3, H3R2Me2, H4K8Ac and H2BK5Ac. The dashed line represents an ideal predictor. (A) Experimental evidences of functional regions: square (p300), triangle (DHS) and cross (any experimental); (B) Computational evidences of functional regions: square (PReMod), triangle (PhastCons) and cross (any computational).
Mentions: Using this optimum combination of five epigenetic marks, we predicted enhancers genome-wide in human CD4+ T cells. Epigenetic signals were scanned using 1.0 kb windows with a resolution of 400 bp, retaining only the highest scoring element within every 1.0 kb region. Here we report 23 574 predicted enhancers in human CD4+ T cells (Supplementary Table S4). The quality of the predictions in CD4+ T cells was evaluated by counting the overlap with 72 646 DHS regions and 3989 p300 binding sites. The clustering of TFBS and the evolutionary conservation of the predicted regions were also analyzed. Figure 8 shows the plots of the total number of predictions versus supported predictions in a range from 5000 to 24 000 predicted enhancers. Substantial overlaps with DHS regions and p300 binding sites (in the ranges 88–91% and 28–34%) were observed for different subsets of predicted enhancers. Taken these data together, 88–91% of the predictions were supported by at least one of the experimental lines of evidence. Moreover, 10–12% of the predicted regions were found to be evolutionarily conserved according to the PhastCons scores (44), and TFBS clusters were detected in a similar fraction of these regions according to the PReMod database (45). About 19% of the predictions were at least supported by one of the computational lines of evidence.Figure 8.

Bottom Line: In an independent test, ChromaGenSVM recovered 88% of the experimentally supported enhancers in the pilot ENCODE region of interferon gamma-treated HeLa cells.Furthermore, ChromaGenSVM successfully combined the profiles of only five distinct methylation and acetylation marks from ChIP-seq libraries done in human CD4(+) T cells to predict ∼21,000 experimentally supported enhancers within 1.0 kb regions and with a precision of ∼90%, thereby improving previous predictions on the same dataset by 21%.The combined results indicate that ChromaGenSVM comfortably outperforms previously published methods and that enhancers are best predicted by specific combinations of histone methylation and acetylation marks.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics and Genomics Laboratory, WPI-Immunology Frontier Research Center (IFReC), Osaka University, 3-1 Yamadaoka, Suita 565-0871, Osaka, Japan.

ABSTRACT
The chemical modification of histones at specific DNA regulatory elements is linked to the activation, inactivation and poising of genes. A number of tools exist to predict enhancers from chromatin modification maps, but their practical application is limited because they either (i) consider a smaller number of marks than those necessary to define the various enhancer classes or (ii) work with an excessive number of marks, which is experimentally unviable. We have developed a method for chromatin state detection using support vector machines in combination with genetic algorithm optimization, called ChromaGenSVM. ChromaGenSVM selects optimum combinations of specific histone epigenetic marks to predict enhancers. In an independent test, ChromaGenSVM recovered 88% of the experimentally supported enhancers in the pilot ENCODE region of interferon gamma-treated HeLa cells. Furthermore, ChromaGenSVM successfully combined the profiles of only five distinct methylation and acetylation marks from ChIP-seq libraries done in human CD4(+) T cells to predict ∼21,000 experimentally supported enhancers within 1.0 kb regions and with a precision of ∼90%, thereby improving previous predictions on the same dataset by 21%. The combined results indicate that ChromaGenSVM comfortably outperforms previously published methods and that enhancers are best predicted by specific combinations of histone methylation and acetylation marks.

Show MeSH
Related in: MedlinePlus