Limits...
OccuPeak: ChIP-Seq peak calling based on internal background modelling.

de Boer BA, van Duijvenboden K, van den Boogaard M, Christoffels VM, Barnett P, Ruijter JM - PLoS ONE (2014)

Bottom Line: However, the GC-content of reads in Input-seq datasets deviates significantly from that in ChIP-seq datasets.Moreover, we observed that a commonly used peak calling program performed equally well when the use of a simulated uniform background set was compared to an Input-seq dataset.Moreover, peaks called by OccuPeak were significantly enriched with cardiac disease-associated SNPs.

View Article: PubMed Central - PubMed

Affiliation: Department of Anatomy, Embryology & Physiology, Academic Medical Centre, Amsterdam, The Netherlands.

ABSTRACT

Unlabelled: ChIP-seq has become a major tool for the genome-wide identification of transcription factor binding or histone modification sites. Most peak-calling algorithms require input control datasets to model the occurrence of background reads to account for local sequencing and GC bias. However, the GC-content of reads in Input-seq datasets deviates significantly from that in ChIP-seq datasets. Moreover, we observed that a commonly used peak calling program performed equally well when the use of a simulated uniform background set was compared to an Input-seq dataset. This contradicts the assumption that input control datasets are necessary to fatefully reflect the background read distribution. Because the GC-content of the abundant single reads in ChIP-seq datasets is similar to those of randomly sampled regions we designed a peak-calling algorithm with a background model based on overlapping single reads. The application, OccuPeak, uses the abundant low frequency tags present in each ChIP-seq dataset to model the background, thereby avoiding the need for additional datasets. Analysis of the performance of OccuPeak showed robust model parameters. Its measure of peak significance, the excess ratio, is only dependent on the tag density of a peak and the global noise levels. Compared to the commonly used peak-calling applications MACS and CisGenome, OccuPeak had the highest sensitivity in an enhancer identification benchmark test, and performed similar in an overlap tests of transcription factor occupation with DNase I hypersensitive sites and H3K27ac sites. Moreover, peaks called by OccuPeak were significantly enriched with cardiac disease-associated SNPs. OccuPeak runs as a standalone application and does not require extensive tweaking of parameters, making its use straightforward and user friendly.

Availability: http://occupeak.hfrc.nl.

Show MeSH

Related in: MedlinePlus

Effect of window size and tag density on the pattern and number of called peaks.Peaks were called with OccuPeak in the TBX3 ChIP-seq dataset using different window sizes and tag densities. A. UCSC genome browser snapshot capturing the effects on peak calling in a region containing 2 validated cardiac enhancers. B. Mean number of peaks called per Mb of genome. Note the (almost perfect) parallelism of the profiles for different tag density (100% and 12.5%) and window size (chromosome and 0.1 Mb). C. Effect of window size on the gain or loss of peaks. When the peaks called with a chromosome-wide window are used as a reference (green), smaller windows lead to loss of peaks (blue) but hardly ever to gain of peaks (yellow).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4061025&req=5

pone-0099844-g004: Effect of window size and tag density on the pattern and number of called peaks.Peaks were called with OccuPeak in the TBX3 ChIP-seq dataset using different window sizes and tag densities. A. UCSC genome browser snapshot capturing the effects on peak calling in a region containing 2 validated cardiac enhancers. B. Mean number of peaks called per Mb of genome. Note the (almost perfect) parallelism of the profiles for different tag density (100% and 12.5%) and window size (chromosome and 0.1 Mb). C. Effect of window size on the gain or loss of peaks. When the peaks called with a chromosome-wide window are used as a reference (green), smaller windows lead to loss of peaks (blue) but hardly ever to gain of peaks (yellow).

Mentions: Most peak-calling programs use sliding windows to determine the abundance of local background tags to be used as a local peak-calling threshold [12]. Moreover, the performance of ChIP-seq peak-calling methods has been reported to depend on the total number of reads, i.e. read density, in the dataset [27]. To investigate whether those issues affect the performance of the OccuPeak algorithm, the effect of the size of the sampling window and of the tag density on the number of peaks and the pattern of peaks was determined. To this end, systematic sub-sampling was used to generate ChIP-seq datasets containing 12.5, 25, 50 and 75% of the total number of tags. For each subset, OccuPeak was applied with window sizes ranging from 0.1 Mb to complete chromosomes. The required number of windows to completely cover each chromosome was distributed uniformly with minimal overlap. The resulting peak sets were visualized and compared (Figure 4A; File S1).


OccuPeak: ChIP-Seq peak calling based on internal background modelling.

de Boer BA, van Duijvenboden K, van den Boogaard M, Christoffels VM, Barnett P, Ruijter JM - PLoS ONE (2014)

Effect of window size and tag density on the pattern and number of called peaks.Peaks were called with OccuPeak in the TBX3 ChIP-seq dataset using different window sizes and tag densities. A. UCSC genome browser snapshot capturing the effects on peak calling in a region containing 2 validated cardiac enhancers. B. Mean number of peaks called per Mb of genome. Note the (almost perfect) parallelism of the profiles for different tag density (100% and 12.5%) and window size (chromosome and 0.1 Mb). C. Effect of window size on the gain or loss of peaks. When the peaks called with a chromosome-wide window are used as a reference (green), smaller windows lead to loss of peaks (blue) but hardly ever to gain of peaks (yellow).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4061025&req=5

pone-0099844-g004: Effect of window size and tag density on the pattern and number of called peaks.Peaks were called with OccuPeak in the TBX3 ChIP-seq dataset using different window sizes and tag densities. A. UCSC genome browser snapshot capturing the effects on peak calling in a region containing 2 validated cardiac enhancers. B. Mean number of peaks called per Mb of genome. Note the (almost perfect) parallelism of the profiles for different tag density (100% and 12.5%) and window size (chromosome and 0.1 Mb). C. Effect of window size on the gain or loss of peaks. When the peaks called with a chromosome-wide window are used as a reference (green), smaller windows lead to loss of peaks (blue) but hardly ever to gain of peaks (yellow).
Mentions: Most peak-calling programs use sliding windows to determine the abundance of local background tags to be used as a local peak-calling threshold [12]. Moreover, the performance of ChIP-seq peak-calling methods has been reported to depend on the total number of reads, i.e. read density, in the dataset [27]. To investigate whether those issues affect the performance of the OccuPeak algorithm, the effect of the size of the sampling window and of the tag density on the number of peaks and the pattern of peaks was determined. To this end, systematic sub-sampling was used to generate ChIP-seq datasets containing 12.5, 25, 50 and 75% of the total number of tags. For each subset, OccuPeak was applied with window sizes ranging from 0.1 Mb to complete chromosomes. The required number of windows to completely cover each chromosome was distributed uniformly with minimal overlap. The resulting peak sets were visualized and compared (Figure 4A; File S1).

Bottom Line: However, the GC-content of reads in Input-seq datasets deviates significantly from that in ChIP-seq datasets.Moreover, we observed that a commonly used peak calling program performed equally well when the use of a simulated uniform background set was compared to an Input-seq dataset.Moreover, peaks called by OccuPeak were significantly enriched with cardiac disease-associated SNPs.

View Article: PubMed Central - PubMed

Affiliation: Department of Anatomy, Embryology & Physiology, Academic Medical Centre, Amsterdam, The Netherlands.

ABSTRACT

Unlabelled: ChIP-seq has become a major tool for the genome-wide identification of transcription factor binding or histone modification sites. Most peak-calling algorithms require input control datasets to model the occurrence of background reads to account for local sequencing and GC bias. However, the GC-content of reads in Input-seq datasets deviates significantly from that in ChIP-seq datasets. Moreover, we observed that a commonly used peak calling program performed equally well when the use of a simulated uniform background set was compared to an Input-seq dataset. This contradicts the assumption that input control datasets are necessary to fatefully reflect the background read distribution. Because the GC-content of the abundant single reads in ChIP-seq datasets is similar to those of randomly sampled regions we designed a peak-calling algorithm with a background model based on overlapping single reads. The application, OccuPeak, uses the abundant low frequency tags present in each ChIP-seq dataset to model the background, thereby avoiding the need for additional datasets. Analysis of the performance of OccuPeak showed robust model parameters. Its measure of peak significance, the excess ratio, is only dependent on the tag density of a peak and the global noise levels. Compared to the commonly used peak-calling applications MACS and CisGenome, OccuPeak had the highest sensitivity in an enhancer identification benchmark test, and performed similar in an overlap tests of transcription factor occupation with DNase I hypersensitive sites and H3K27ac sites. Moreover, peaks called by OccuPeak were significantly enriched with cardiac disease-associated SNPs. OccuPeak runs as a standalone application and does not require extensive tweaking of parameters, making its use straightforward and user friendly.

Availability: http://occupeak.hfrc.nl.

Show MeSH
Related in: MedlinePlus