Limits...
Predicting chromatin organization using histone marks.

Huang J, Marco E, Pinello L, Yuan GC - Genome Biol. (2015)

Bottom Line: To aid experimental effort and to understand the determinants of long-range chromatin interactions, we have developed a computational model integrating Hi-C and histone mark ChIP-seq data to predict two important features of chromatin organization: chromatin interaction hubs and topologically associated domain (TAD) boundaries.Cell-type specific histone mark information is required for prediction of chromatin interaction hubs but not for TAD boundaries.Our predictions provide a useful guide for the exploration of chromatin organization.

View Article: PubMed Central - PubMed

Affiliation: Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA. jhuang@jimmy.harvard.edu.

ABSTRACT
Genome-wide mapping of three dimensional chromatin organization is an important yet technically challenging task. To aid experimental effort and to understand the determinants of long-range chromatin interactions, we have developed a computational model integrating Hi-C and histone mark ChIP-seq data to predict two important features of chromatin organization: chromatin interaction hubs and topologically associated domain (TAD) boundaries. Our model accurately and robustly predicts these features across datasets and cell types. Cell-type specific histone mark information is required for prediction of chromatin interaction hubs but not for TAD boundaries. Our predictions provide a useful guide for the exploration of chromatin organization.

Show MeSH
Prediction of Jin2013 hubs in IMR90 cells. a Schematic of the BART model. b Prediction accuracy using various features. The ROC curves correspond to the testing data. AUC scores are shown in parentheses. "Histone Marks" represents the combination all of histone marks and CTCF, while "DNA sequence" represents the combination of PhastCons conservation score, TSS proximity and GC content. c Variable selection in BART model. The x-axis represents the usage frequency of each variable in the BART model. d Genome browser snapshot at a hub adjacent to the HOXB gene cluster
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4549084&req=5

Fig3: Prediction of Jin2013 hubs in IMR90 cells. a Schematic of the BART model. b Prediction accuracy using various features. The ROC curves correspond to the testing data. AUC scores are shown in parentheses. "Histone Marks" represents the combination all of histone marks and CTCF, while "DNA sequence" represents the combination of PhastCons conservation score, TSS proximity and GC content. c Variable selection in BART model. The x-axis represents the usage frequency of each variable in the BART model. d Genome browser snapshot at a hub adjacent to the HOXB gene cluster

Mentions: To characterize the epigenetic determinants of hubs, we examined the spatial patterns of CTCF and 9 histone marks adjacent to each chromatin anchor (Methods) (Fig. 2). The most distinct features were the elevated levels of H3K4me1 and H3K27ac, both are well-known markers for enhancer elements, around the center of the hubs compared to other chromatin anchors. In addition, there were also significant albeit weaker differences among several other histone marks. In order to systematically investigate how well these hubs could be predicted from the combination of multiple histone marks, we built a Bayesian Additive Regression Trees (BART) model to classify chromatin anchors based on histone mark ChIP-seq data alone. BART is a Bayesian "sum-of-trees" model [22], averaging results from an ensemble of regression trees (Fig. 3a). Previous studies have shown that BART is effective in modeling various computational biology problems [23].Fig. 2


Predicting chromatin organization using histone marks.

Huang J, Marco E, Pinello L, Yuan GC - Genome Biol. (2015)

Prediction of Jin2013 hubs in IMR90 cells. a Schematic of the BART model. b Prediction accuracy using various features. The ROC curves correspond to the testing data. AUC scores are shown in parentheses. "Histone Marks" represents the combination all of histone marks and CTCF, while "DNA sequence" represents the combination of PhastCons conservation score, TSS proximity and GC content. c Variable selection in BART model. The x-axis represents the usage frequency of each variable in the BART model. d Genome browser snapshot at a hub adjacent to the HOXB gene cluster
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4549084&req=5

Fig3: Prediction of Jin2013 hubs in IMR90 cells. a Schematic of the BART model. b Prediction accuracy using various features. The ROC curves correspond to the testing data. AUC scores are shown in parentheses. "Histone Marks" represents the combination all of histone marks and CTCF, while "DNA sequence" represents the combination of PhastCons conservation score, TSS proximity and GC content. c Variable selection in BART model. The x-axis represents the usage frequency of each variable in the BART model. d Genome browser snapshot at a hub adjacent to the HOXB gene cluster
Mentions: To characterize the epigenetic determinants of hubs, we examined the spatial patterns of CTCF and 9 histone marks adjacent to each chromatin anchor (Methods) (Fig. 2). The most distinct features were the elevated levels of H3K4me1 and H3K27ac, both are well-known markers for enhancer elements, around the center of the hubs compared to other chromatin anchors. In addition, there were also significant albeit weaker differences among several other histone marks. In order to systematically investigate how well these hubs could be predicted from the combination of multiple histone marks, we built a Bayesian Additive Regression Trees (BART) model to classify chromatin anchors based on histone mark ChIP-seq data alone. BART is a Bayesian "sum-of-trees" model [22], averaging results from an ensemble of regression trees (Fig. 3a). Previous studies have shown that BART is effective in modeling various computational biology problems [23].Fig. 2

Bottom Line: To aid experimental effort and to understand the determinants of long-range chromatin interactions, we have developed a computational model integrating Hi-C and histone mark ChIP-seq data to predict two important features of chromatin organization: chromatin interaction hubs and topologically associated domain (TAD) boundaries.Cell-type specific histone mark information is required for prediction of chromatin interaction hubs but not for TAD boundaries.Our predictions provide a useful guide for the exploration of chromatin organization.

View Article: PubMed Central - PubMed

Affiliation: Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA. jhuang@jimmy.harvard.edu.

ABSTRACT
Genome-wide mapping of three dimensional chromatin organization is an important yet technically challenging task. To aid experimental effort and to understand the determinants of long-range chromatin interactions, we have developed a computational model integrating Hi-C and histone mark ChIP-seq data to predict two important features of chromatin organization: chromatin interaction hubs and topologically associated domain (TAD) boundaries. Our model accurately and robustly predicts these features across datasets and cell types. Cell-type specific histone mark information is required for prediction of chromatin interaction hubs but not for TAD boundaries. Our predictions provide a useful guide for the exploration of chromatin organization.

Show MeSH