Limits...
dCaP: detecting differential binding events in multiple conditions and proteins.

Chen KB, Hardison R, Zhang Y - BMC Genomics (2014)

Bottom Line: Using simulation, we demonstrate the superior power of dCaP compared to existing methods.We further show in the mouse dataset that dCaP captures genomic regions showing significant signal variations for TAL1 occupancy between two mouse erythroid cell lines.Here, we developed a novel approach to utilize the cooperative property of proteins to detect differential binding given multivariate ChIP-seq samples to provide better power, aiming for complementing existing approaches and providing new insights in the method development in this field.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Current ChIP-seq studies are interested in comparing multiple epigenetic profiles across several cell types and tissues simultaneously for studying constitutive and differential regulation. Simultaneous analysis of multiple epigenetic features in many samples can gain substantial power and specificity than analyzing individual features and/or samples separately. Yet there are currently few tools can perform joint inference of constitutive and differential regulation in multi-feature-multi-condition contexts with statistical testing. Existing tools either test regulatory variation for one factor in multiple samples at a time, or for multiple factors in one or two samples. Many of them only identify binary rather than quantitative variation, which are sensitive to threshold choices.

Results: We propose a novel and powerful method called dCaP for simultaneously detecting constitutive and differential regulation of multiple epigenetic factors in multiple samples. Using simulation, we demonstrate the superior power of dCaP compared to existing methods. We then apply dCaP to two datasets from human and mouse ENCODE projects to demonstrate its utility. We show in the human dataset that the cell-type specific regulatory loci detected by dCaP are significantly enriched near genes with cell-type specific functions and disease relevance. We further show in the mouse dataset that dCaP captures genomic regions showing significant signal variations for TAL1 occupancy between two mouse erythroid cell lines. The novel TAL1 occupancy loci detected only by dCaP are highly enriched with GATA1 occupancy and differential gene expression, while those detected only by other methods are not.

Conclusions: Here, we developed a novel approach to utilize the cooperative property of proteins to detect differential binding given multivariate ChIP-seq samples to provide better power, aiming for complementing existing approaches and providing new insights in the method development in this field.

Show MeSH
Length distribution of cell-type specific enriched binding (+) and depleted binding (-) segments. The symbol size is proportional to the frequency of the corresponding segment length.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4290593&req=5

Figure 3: Length distribution of cell-type specific enriched binding (+) and depleted binding (-) segments. The symbol size is proportional to the frequency of the corresponding segment length.

Mentions: By applying dCaP to the ENCODE data sets of three epigenetic factors (CTCF, Pol2, DNaseI) in five cell lines (GM12878, HUVEC, HeLa-S3, HepG2, K562), we detected 194,840 (6.9% of all regions tested) significant occupancy regions genome-wide (excluding chromosome Y) at Bonferroni adjusted 0.05 significance level. Among those significant regions, 33,205 (17% of 194,840) showed differential signals of the 3 factors among the 5 cell lines. Furthermore, 16,452 (49% of 33,205) of the differential regions were cell-type specific. After merging the contiguous regions, we obtained 13,858 cell-type specific intervals. A majority of cell-type specific intervals have length 1-kb. Yet, we found a set of signal-enriched regions with length longer than 10 KB (Figure 3). These long cell-type specific intervals were mostly due to the long span of POL2 signals, and they are consistent with the cell-type specific gene expression (ENCODE RNA-Seq). Examples of these regions include genes ASHG, AFP, GPC3 in HepG2 cell line, THBS1 in HUVEC cell line, BC068609, LOC648232 and a 30 KB region close to 7q36.2 (starting from position 152,710,000 in hg18) in K562 cell line. We also observed clusters of cell-type specific depleted signals in local intervals, which were mostly due to large genomic deletions.


dCaP: detecting differential binding events in multiple conditions and proteins.

Chen KB, Hardison R, Zhang Y - BMC Genomics (2014)

Length distribution of cell-type specific enriched binding (+) and depleted binding (-) segments. The symbol size is proportional to the frequency of the corresponding segment length.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4290593&req=5

Figure 3: Length distribution of cell-type specific enriched binding (+) and depleted binding (-) segments. The symbol size is proportional to the frequency of the corresponding segment length.
Mentions: By applying dCaP to the ENCODE data sets of three epigenetic factors (CTCF, Pol2, DNaseI) in five cell lines (GM12878, HUVEC, HeLa-S3, HepG2, K562), we detected 194,840 (6.9% of all regions tested) significant occupancy regions genome-wide (excluding chromosome Y) at Bonferroni adjusted 0.05 significance level. Among those significant regions, 33,205 (17% of 194,840) showed differential signals of the 3 factors among the 5 cell lines. Furthermore, 16,452 (49% of 33,205) of the differential regions were cell-type specific. After merging the contiguous regions, we obtained 13,858 cell-type specific intervals. A majority of cell-type specific intervals have length 1-kb. Yet, we found a set of signal-enriched regions with length longer than 10 KB (Figure 3). These long cell-type specific intervals were mostly due to the long span of POL2 signals, and they are consistent with the cell-type specific gene expression (ENCODE RNA-Seq). Examples of these regions include genes ASHG, AFP, GPC3 in HepG2 cell line, THBS1 in HUVEC cell line, BC068609, LOC648232 and a 30 KB region close to 7q36.2 (starting from position 152,710,000 in hg18) in K562 cell line. We also observed clusters of cell-type specific depleted signals in local intervals, which were mostly due to large genomic deletions.

Bottom Line: Using simulation, we demonstrate the superior power of dCaP compared to existing methods.We further show in the mouse dataset that dCaP captures genomic regions showing significant signal variations for TAL1 occupancy between two mouse erythroid cell lines.Here, we developed a novel approach to utilize the cooperative property of proteins to detect differential binding given multivariate ChIP-seq samples to provide better power, aiming for complementing existing approaches and providing new insights in the method development in this field.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Current ChIP-seq studies are interested in comparing multiple epigenetic profiles across several cell types and tissues simultaneously for studying constitutive and differential regulation. Simultaneous analysis of multiple epigenetic features in many samples can gain substantial power and specificity than analyzing individual features and/or samples separately. Yet there are currently few tools can perform joint inference of constitutive and differential regulation in multi-feature-multi-condition contexts with statistical testing. Existing tools either test regulatory variation for one factor in multiple samples at a time, or for multiple factors in one or two samples. Many of them only identify binary rather than quantitative variation, which are sensitive to threshold choices.

Results: We propose a novel and powerful method called dCaP for simultaneously detecting constitutive and differential regulation of multiple epigenetic factors in multiple samples. Using simulation, we demonstrate the superior power of dCaP compared to existing methods. We then apply dCaP to two datasets from human and mouse ENCODE projects to demonstrate its utility. We show in the human dataset that the cell-type specific regulatory loci detected by dCaP are significantly enriched near genes with cell-type specific functions and disease relevance. We further show in the mouse dataset that dCaP captures genomic regions showing significant signal variations for TAL1 occupancy between two mouse erythroid cell lines. The novel TAL1 occupancy loci detected only by dCaP are highly enriched with GATA1 occupancy and differential gene expression, while those detected only by other methods are not.

Conclusions: Here, we developed a novel approach to utilize the cooperative property of proteins to detect differential binding given multivariate ChIP-seq samples to provide better power, aiming for complementing existing approaches and providing new insights in the method development in this field.

Show MeSH