Limits...
OccuPeak: ChIP-Seq peak calling based on internal background modelling.

de Boer BA, van Duijvenboden K, van den Boogaard M, Christoffels VM, Barnett P, Ruijter JM - PLoS ONE (2014)

Bottom Line: However, the GC-content of reads in Input-seq datasets deviates significantly from that in ChIP-seq datasets.Moreover, we observed that a commonly used peak calling program performed equally well when the use of a simulated uniform background set was compared to an Input-seq dataset.Moreover, peaks called by OccuPeak were significantly enriched with cardiac disease-associated SNPs.

View Article: PubMed Central - PubMed

Affiliation: Department of Anatomy, Embryology & Physiology, Academic Medical Centre, Amsterdam, The Netherlands.

ABSTRACT

Unlabelled: ChIP-seq has become a major tool for the genome-wide identification of transcription factor binding or histone modification sites. Most peak-calling algorithms require input control datasets to model the occurrence of background reads to account for local sequencing and GC bias. However, the GC-content of reads in Input-seq datasets deviates significantly from that in ChIP-seq datasets. Moreover, we observed that a commonly used peak calling program performed equally well when the use of a simulated uniform background set was compared to an Input-seq dataset. This contradicts the assumption that input control datasets are necessary to fatefully reflect the background read distribution. Because the GC-content of the abundant single reads in ChIP-seq datasets is similar to those of randomly sampled regions we designed a peak-calling algorithm with a background model based on overlapping single reads. The application, OccuPeak, uses the abundant low frequency tags present in each ChIP-seq dataset to model the background, thereby avoiding the need for additional datasets. Analysis of the performance of OccuPeak showed robust model parameters. Its measure of peak significance, the excess ratio, is only dependent on the tag density of a peak and the global noise levels. Compared to the commonly used peak-calling applications MACS and CisGenome, OccuPeak had the highest sensitivity in an enhancer identification benchmark test, and performed similar in an overlap tests of transcription factor occupation with DNase I hypersensitive sites and H3K27ac sites. Moreover, peaks called by OccuPeak were significantly enriched with cardiac disease-associated SNPs. OccuPeak runs as a standalone application and does not require extensive tweaking of parameters, making its use straightforward and user friendly.

Availability: http://occupeak.hfrc.nl.

Show MeSH

Related in: MedlinePlus

Consistency of different peak-calling methods.OccuPeak, MACS and CisGenome were used to call peaks for each of the two replicate p300 ChIP-seq experiments generated by the ENCODE consortium (GSE29184). A. Peaks are considered common (green) if they were identified in both replicates and singleton if they were only found in the current replicate (yellow and blue), as depicted in the UCSC genome browser example (B).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4061025&req=5

pone-0099844-g005: Consistency of different peak-calling methods.OccuPeak, MACS and CisGenome were used to call peaks for each of the two replicate p300 ChIP-seq experiments generated by the ENCODE consortium (GSE29184). A. Peaks are considered common (green) if they were identified in both replicates and singleton if they were only found in the current replicate (yellow and blue), as depicted in the UCSC genome browser example (B).

Mentions: The availability of replicate p300 ChIP-seq experiments [26] provided the opportunity to determine the consistency of peak-calling algorithms between biological replicates. Peaks were considered common (Fig 5; green bars) if they were identified in both replicate datasets and singleton if they were only identified in one replicate set (Fig 5; blue and yellow bars for replicate 1 and 2, respectively). Occupeak found 52% peaks common to both datasets (Figure 5; bar 1). We also determined the consistency in peak calling for the MACS and CisGenome algorithms (Figure 5, bars 2 and 3). Cisgenome showed 50% of peaks being called consistently between sets, whereas MACS reached 54%. However, peak-calling power, reflected in the number of peaks called at default threshold, differs per method: the number of common peaks identified by OccuPeak exceeds the total number of peaks called by the other peak callers. Although the different peak-calling methods do not differ in consistency of peak calling, an analysis based on overlap between datasets will benefit from a large number of observed peaks because it avoids the loss of information when datasets differ substantially in read density or background noise.


OccuPeak: ChIP-Seq peak calling based on internal background modelling.

de Boer BA, van Duijvenboden K, van den Boogaard M, Christoffels VM, Barnett P, Ruijter JM - PLoS ONE (2014)

Consistency of different peak-calling methods.OccuPeak, MACS and CisGenome were used to call peaks for each of the two replicate p300 ChIP-seq experiments generated by the ENCODE consortium (GSE29184). A. Peaks are considered common (green) if they were identified in both replicates and singleton if they were only found in the current replicate (yellow and blue), as depicted in the UCSC genome browser example (B).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4061025&req=5

pone-0099844-g005: Consistency of different peak-calling methods.OccuPeak, MACS and CisGenome were used to call peaks for each of the two replicate p300 ChIP-seq experiments generated by the ENCODE consortium (GSE29184). A. Peaks are considered common (green) if they were identified in both replicates and singleton if they were only found in the current replicate (yellow and blue), as depicted in the UCSC genome browser example (B).
Mentions: The availability of replicate p300 ChIP-seq experiments [26] provided the opportunity to determine the consistency of peak-calling algorithms between biological replicates. Peaks were considered common (Fig 5; green bars) if they were identified in both replicate datasets and singleton if they were only identified in one replicate set (Fig 5; blue and yellow bars for replicate 1 and 2, respectively). Occupeak found 52% peaks common to both datasets (Figure 5; bar 1). We also determined the consistency in peak calling for the MACS and CisGenome algorithms (Figure 5, bars 2 and 3). Cisgenome showed 50% of peaks being called consistently between sets, whereas MACS reached 54%. However, peak-calling power, reflected in the number of peaks called at default threshold, differs per method: the number of common peaks identified by OccuPeak exceeds the total number of peaks called by the other peak callers. Although the different peak-calling methods do not differ in consistency of peak calling, an analysis based on overlap between datasets will benefit from a large number of observed peaks because it avoids the loss of information when datasets differ substantially in read density or background noise.

Bottom Line: However, the GC-content of reads in Input-seq datasets deviates significantly from that in ChIP-seq datasets.Moreover, we observed that a commonly used peak calling program performed equally well when the use of a simulated uniform background set was compared to an Input-seq dataset.Moreover, peaks called by OccuPeak were significantly enriched with cardiac disease-associated SNPs.

View Article: PubMed Central - PubMed

Affiliation: Department of Anatomy, Embryology & Physiology, Academic Medical Centre, Amsterdam, The Netherlands.

ABSTRACT

Unlabelled: ChIP-seq has become a major tool for the genome-wide identification of transcription factor binding or histone modification sites. Most peak-calling algorithms require input control datasets to model the occurrence of background reads to account for local sequencing and GC bias. However, the GC-content of reads in Input-seq datasets deviates significantly from that in ChIP-seq datasets. Moreover, we observed that a commonly used peak calling program performed equally well when the use of a simulated uniform background set was compared to an Input-seq dataset. This contradicts the assumption that input control datasets are necessary to fatefully reflect the background read distribution. Because the GC-content of the abundant single reads in ChIP-seq datasets is similar to those of randomly sampled regions we designed a peak-calling algorithm with a background model based on overlapping single reads. The application, OccuPeak, uses the abundant low frequency tags present in each ChIP-seq dataset to model the background, thereby avoiding the need for additional datasets. Analysis of the performance of OccuPeak showed robust model parameters. Its measure of peak significance, the excess ratio, is only dependent on the tag density of a peak and the global noise levels. Compared to the commonly used peak-calling applications MACS and CisGenome, OccuPeak had the highest sensitivity in an enhancer identification benchmark test, and performed similar in an overlap tests of transcription factor occupation with DNase I hypersensitive sites and H3K27ac sites. Moreover, peaks called by OccuPeak were significantly enriched with cardiac disease-associated SNPs. OccuPeak runs as a standalone application and does not require extensive tweaking of parameters, making its use straightforward and user friendly.

Availability: http://occupeak.hfrc.nl.

Show MeSH
Related in: MedlinePlus