Limits...
MMDiff: quantitative testing for shape changes in ChIP-Seq data sets.

Schweikert G, Cseke B, Clouaire T, Bird A, Sanguinetti G - BMC Genomics (2013)

Bottom Line: Our empirical analysis shows that the method yields reproducible results across experiments, and is able to detect functional important changes in histone modifications.In both cases, MMDiff proves to be complementary to count-based methods.Our results demonstrate that higher order features of ChIP-Seq peaks carry relevant and often complementary information to total counts, and hence are important in assessing differential histone modifications and transcription factor binding.

View Article: PubMed Central - HTML - PubMed

Affiliation: School of Informatics, University of Edinburgh, 10 Crichton Street, Edinburgh EH89AB, UK. G.Schweikert@ed.ac.uk.

ABSTRACT

Background: Cell-specific gene expression is controlled by epigenetic modifications and transcription factor binding. While genome-wide maps for these protein-DNA interactions have become widely available, quantitative comparison of the resulting ChIP-Seq data sets remains challenging. Current approaches to detect differentially bound or modified regions are mainly borrowed from RNA-Seq data analysis, thus focusing on total counts of fragments mapped to a region, ignoring any information encoded in the shape of the peaks.

Results: Here, we present MMDiff, a robust, broadly applicable method for detecting differences between sequence count data sets. Based on quantifying shape changes in signal profiles, it overcomes challenges imposed by the highly structured nature of the data and the paucity of replicates.We first use a simulated data set to compare the performance of MMDiff with results obtained by four alternative methods. We demonstrate that MMDiff excels when peak profiles change between samples. We next use MMDiff to re-analyse a recent data set of the histone modification H3K4me3 elucidating the establishment of this prominent epigenomic marker. Our empirical analysis shows that the method yields reproducible results across experiments, and is able to detect functional important changes in histone modifications. To further explore the broader applicability of MMDiff, we apply it to two ENCODE data sets: one investigating the histone modification H3K27ac and one measuring the genome-wide binding of the transcription factor CTCF. In both cases, MMDiff proves to be complementary to count-based methods. In addition, we can show that MMDiff is capable of directly detecting changes of homotypic binding events at neighbouring binding sites. MMDiff is readily available as a Bioconductor package.

Conclusions: Our results demonstrate that higher order features of ChIP-Seq peaks carry relevant and often complementary information to total counts, and hence are important in assessing differential histone modifications and transcription factor binding. We have developed a new computational method, MMDiff, that is capable of exploring these features and therefore closes an existing gap in the analysis of ChIP-Seq data sets.

Show MeSH

Related in: MedlinePlus

Differential calling and reproducibility in H3K4me3 ChIP-Seq data sets. A-C MMD-based distances as a function of mean total counts in experiment AB.1. Each dot represents one examined promoter. A MMD values computed between Cfp1-/- and WT. B MMD determined between Resc and WT overlayed in black. These provide a measure of the biological and experimental variability. C Plots are overlayed and promoters that are significantly different in Cfp1-/- versus WT/Resc (FDR < 0.05) are shown in red. D-E MA plot representations of the same data showing smooth scatter plots of log2 fold changes versus mean normalised counts. The red dots mark promoters detected as differentially modified (DMPs) at a 5% false discovery rate. D DMPs according to MMDiff and E according to DESeq. F Reproducibility of differential calling across experiments AB.1 and AB.2. DESeq and MMDiff are compared both for differentially called promoters (left) and for MACS consensus peaks.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4008153&req=5

Figure 3: Differential calling and reproducibility in H3K4me3 ChIP-Seq data sets. A-C MMD-based distances as a function of mean total counts in experiment AB.1. Each dot represents one examined promoter. A MMD values computed between Cfp1-/- and WT. B MMD determined between Resc and WT overlayed in black. These provide a measure of the biological and experimental variability. C Plots are overlayed and promoters that are significantly different in Cfp1-/- versus WT/Resc (FDR < 0.05) are shown in red. D-E MA plot representations of the same data showing smooth scatter plots of log2 fold changes versus mean normalised counts. The red dots mark promoters detected as differentially modified (DMPs) at a 5% false discovery rate. D DMPs according to MMDiff and E according to DESeq. F Reproducibility of differential calling across experiments AB.1 and AB.2. DESeq and MMDiff are compared both for differentially called promoters (left) and for MACS consensus peaks.

Mentions: We used MMDiff to find peaks and promoter regions that are significantly different in the Cfp1-/- cell line versus WT and Resc. To elucidate the working principles of MMDiff, we show in Figure3 MMD values versus mean total counts for the 27,807 promoter regions. In Figure3A, MMD values between Cfp1-/- and WT are shown. For comparison, MMD distances between Resc and WT are overlayed in Figure3B. As expected from equation 2, the MMD value between replicates strongly depends on the coverage of the peak, with high enriched peaks showing smaller MMD values. In contrast, there is a large number of promoters with high coverage that have been assigned a large MMD value in the Cfp1-/- vs WT comparison. This leads to a clear separation of a group of differentially modified promoters (DMPs) with enrichment profiles that are more different between Cfp1-/- and WT/Resc than can be explained by experimental and biological variation. In Figure3C 2022 promoters with a FDR < 0.05 are marked in red.


MMDiff: quantitative testing for shape changes in ChIP-Seq data sets.

Schweikert G, Cseke B, Clouaire T, Bird A, Sanguinetti G - BMC Genomics (2013)

Differential calling and reproducibility in H3K4me3 ChIP-Seq data sets. A-C MMD-based distances as a function of mean total counts in experiment AB.1. Each dot represents one examined promoter. A MMD values computed between Cfp1-/- and WT. B MMD determined between Resc and WT overlayed in black. These provide a measure of the biological and experimental variability. C Plots are overlayed and promoters that are significantly different in Cfp1-/- versus WT/Resc (FDR < 0.05) are shown in red. D-E MA plot representations of the same data showing smooth scatter plots of log2 fold changes versus mean normalised counts. The red dots mark promoters detected as differentially modified (DMPs) at a 5% false discovery rate. D DMPs according to MMDiff and E according to DESeq. F Reproducibility of differential calling across experiments AB.1 and AB.2. DESeq and MMDiff are compared both for differentially called promoters (left) and for MACS consensus peaks.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4008153&req=5

Figure 3: Differential calling and reproducibility in H3K4me3 ChIP-Seq data sets. A-C MMD-based distances as a function of mean total counts in experiment AB.1. Each dot represents one examined promoter. A MMD values computed between Cfp1-/- and WT. B MMD determined between Resc and WT overlayed in black. These provide a measure of the biological and experimental variability. C Plots are overlayed and promoters that are significantly different in Cfp1-/- versus WT/Resc (FDR < 0.05) are shown in red. D-E MA plot representations of the same data showing smooth scatter plots of log2 fold changes versus mean normalised counts. The red dots mark promoters detected as differentially modified (DMPs) at a 5% false discovery rate. D DMPs according to MMDiff and E according to DESeq. F Reproducibility of differential calling across experiments AB.1 and AB.2. DESeq and MMDiff are compared both for differentially called promoters (left) and for MACS consensus peaks.
Mentions: We used MMDiff to find peaks and promoter regions that are significantly different in the Cfp1-/- cell line versus WT and Resc. To elucidate the working principles of MMDiff, we show in Figure3 MMD values versus mean total counts for the 27,807 promoter regions. In Figure3A, MMD values between Cfp1-/- and WT are shown. For comparison, MMD distances between Resc and WT are overlayed in Figure3B. As expected from equation 2, the MMD value between replicates strongly depends on the coverage of the peak, with high enriched peaks showing smaller MMD values. In contrast, there is a large number of promoters with high coverage that have been assigned a large MMD value in the Cfp1-/- vs WT comparison. This leads to a clear separation of a group of differentially modified promoters (DMPs) with enrichment profiles that are more different between Cfp1-/- and WT/Resc than can be explained by experimental and biological variation. In Figure3C 2022 promoters with a FDR < 0.05 are marked in red.

Bottom Line: Our empirical analysis shows that the method yields reproducible results across experiments, and is able to detect functional important changes in histone modifications.In both cases, MMDiff proves to be complementary to count-based methods.Our results demonstrate that higher order features of ChIP-Seq peaks carry relevant and often complementary information to total counts, and hence are important in assessing differential histone modifications and transcription factor binding.

View Article: PubMed Central - HTML - PubMed

Affiliation: School of Informatics, University of Edinburgh, 10 Crichton Street, Edinburgh EH89AB, UK. G.Schweikert@ed.ac.uk.

ABSTRACT

Background: Cell-specific gene expression is controlled by epigenetic modifications and transcription factor binding. While genome-wide maps for these protein-DNA interactions have become widely available, quantitative comparison of the resulting ChIP-Seq data sets remains challenging. Current approaches to detect differentially bound or modified regions are mainly borrowed from RNA-Seq data analysis, thus focusing on total counts of fragments mapped to a region, ignoring any information encoded in the shape of the peaks.

Results: Here, we present MMDiff, a robust, broadly applicable method for detecting differences between sequence count data sets. Based on quantifying shape changes in signal profiles, it overcomes challenges imposed by the highly structured nature of the data and the paucity of replicates.We first use a simulated data set to compare the performance of MMDiff with results obtained by four alternative methods. We demonstrate that MMDiff excels when peak profiles change between samples. We next use MMDiff to re-analyse a recent data set of the histone modification H3K4me3 elucidating the establishment of this prominent epigenomic marker. Our empirical analysis shows that the method yields reproducible results across experiments, and is able to detect functional important changes in histone modifications. To further explore the broader applicability of MMDiff, we apply it to two ENCODE data sets: one investigating the histone modification H3K27ac and one measuring the genome-wide binding of the transcription factor CTCF. In both cases, MMDiff proves to be complementary to count-based methods. In addition, we can show that MMDiff is capable of directly detecting changes of homotypic binding events at neighbouring binding sites. MMDiff is readily available as a Bioconductor package.

Conclusions: Our results demonstrate that higher order features of ChIP-Seq peaks carry relevant and often complementary information to total counts, and hence are important in assessing differential histone modifications and transcription factor binding. We have developed a new computational method, MMDiff, that is capable of exploring these features and therefore closes an existing gap in the analysis of ChIP-Seq data sets.

Show MeSH
Related in: MedlinePlus