Limits...
Normalization of ChIP-seq data with control.

Liang K, Keleş S - BMC Bioinformatics (2012)

Bottom Line: ChIP-seq has become an important tool for identifying genome-wide protein-DNA interactions, including transcription factor binding and histone modifications.Proper normalization between the ChIP and control samples is an essential aspect of ChIP-seq data analysis.Our results indicate that the proper normalization between the ChIP and control samples is an important step in ChIP-seq analysis in terms of power and error rate control.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada. k22liang@uwaterloo.ca

ABSTRACT

Background: ChIP-seq has become an important tool for identifying genome-wide protein-DNA interactions, including transcription factor binding and histone modifications. In ChIP-seq experiments, ChIP samples are usually coupled with their matching control samples. Proper normalization between the ChIP and control samples is an essential aspect of ChIP-seq data analysis.

Results: We have developed a novel method for estimating the normalization factor between the ChIP and the control samples. Our method, named as NCIS (Normalization of ChIP-seq) can accommodate both low and high sequencing depth datasets. We compare statistical properties of NCIS against existing methods in a set of diverse simulation settings, where NCIS enjoys the best estimation precision. In addition, we illustrate the impact of the normalization factor in FDR control and show that NCIS leads to more power among methods that control FDR at nominal levels.

Conclusion: Our results indicate that the proper normalization between the ChIP and control samples is an important step in ChIP-seq analysis in terms of power and error rate control. Our proposed method shows excellent statistical properties and is useful in the full range of ChIP-seq applications, especially with deeply sequenced data.

Show MeSH
ChIP/control ratio as a function of total count for human NFκB data. NFκB marginal ChIP/control ratio against total with bin-width of 100 bp, both in natural log scale. Sizes of the plotting symbols are proportional to the log (10) of the number of reads. Horizontal dash line indicates the NCIS estimate of the normalization factor. Vertical dash line represents the NCIS total count threshold (tw∗).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3475056&req=5

Figure 5: ChIP/control ratio as a function of total count for human NFκB data. NFκB marginal ChIP/control ratio against total with bin-width of 100 bp, both in natural log scale. Sizes of the plotting symbols are proportional to the log (10) of the number of reads. Horizontal dash line indicates the NCIS estimate of the normalization factor. Vertical dash line represents the NCIS total count threshold (tw∗).

Mentions: Figure 5 displays the marginal ChIP/control ratio against total read counts. We observe that the NFκB data is noisier compared to the yeast data and exhibits violations of the signal-noise model assumption. That is, some bins have larger control reads than expected as illustrated on the right bottom corner of the plot. This phenomenon can arise due to various artifacts in the ChIP-seq experiments, for example, PCR over-amplification in control sample. Indeed, we traced most outliers to a 5 Kbp region in chromosome 8. The read count per nucleotide is displayed in [Additional file 1: Figure S5]. This plot indicates that these are artifacts which are over-amplified in the control sample. The CCAT estimator is susceptible to such artifacts and can have downward bias in estimating the normalization factor. On the other hand, NCIS and CisGenome only utilize bins with low total counts and are robust to such artifacts. SPP is also robust to these artifacts to some degree due to its filtering of bins with large ChIP and control read counts. In this dataset, CisGenome’s estimate of normalization factor is larger than the sequencing depth which is an unreasonable outcome for the normalization factor.


Normalization of ChIP-seq data with control.

Liang K, Keleş S - BMC Bioinformatics (2012)

ChIP/control ratio as a function of total count for human NFκB data. NFκB marginal ChIP/control ratio against total with bin-width of 100 bp, both in natural log scale. Sizes of the plotting symbols are proportional to the log (10) of the number of reads. Horizontal dash line indicates the NCIS estimate of the normalization factor. Vertical dash line represents the NCIS total count threshold (tw∗).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3475056&req=5

Figure 5: ChIP/control ratio as a function of total count for human NFκB data. NFκB marginal ChIP/control ratio against total with bin-width of 100 bp, both in natural log scale. Sizes of the plotting symbols are proportional to the log (10) of the number of reads. Horizontal dash line indicates the NCIS estimate of the normalization factor. Vertical dash line represents the NCIS total count threshold (tw∗).
Mentions: Figure 5 displays the marginal ChIP/control ratio against total read counts. We observe that the NFκB data is noisier compared to the yeast data and exhibits violations of the signal-noise model assumption. That is, some bins have larger control reads than expected as illustrated on the right bottom corner of the plot. This phenomenon can arise due to various artifacts in the ChIP-seq experiments, for example, PCR over-amplification in control sample. Indeed, we traced most outliers to a 5 Kbp region in chromosome 8. The read count per nucleotide is displayed in [Additional file 1: Figure S5]. This plot indicates that these are artifacts which are over-amplified in the control sample. The CCAT estimator is susceptible to such artifacts and can have downward bias in estimating the normalization factor. On the other hand, NCIS and CisGenome only utilize bins with low total counts and are robust to such artifacts. SPP is also robust to these artifacts to some degree due to its filtering of bins with large ChIP and control read counts. In this dataset, CisGenome’s estimate of normalization factor is larger than the sequencing depth which is an unreasonable outcome for the normalization factor.

Bottom Line: ChIP-seq has become an important tool for identifying genome-wide protein-DNA interactions, including transcription factor binding and histone modifications.Proper normalization between the ChIP and control samples is an essential aspect of ChIP-seq data analysis.Our results indicate that the proper normalization between the ChIP and control samples is an important step in ChIP-seq analysis in terms of power and error rate control.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada. k22liang@uwaterloo.ca

ABSTRACT

Background: ChIP-seq has become an important tool for identifying genome-wide protein-DNA interactions, including transcription factor binding and histone modifications. In ChIP-seq experiments, ChIP samples are usually coupled with their matching control samples. Proper normalization between the ChIP and control samples is an essential aspect of ChIP-seq data analysis.

Results: We have developed a novel method for estimating the normalization factor between the ChIP and the control samples. Our method, named as NCIS (Normalization of ChIP-seq) can accommodate both low and high sequencing depth datasets. We compare statistical properties of NCIS against existing methods in a set of diverse simulation settings, where NCIS enjoys the best estimation precision. In addition, we illustrate the impact of the normalization factor in FDR control and show that NCIS leads to more power among methods that control FDR at nominal levels.

Conclusion: Our results indicate that the proper normalization between the ChIP and control samples is an important step in ChIP-seq analysis in terms of power and error rate control. Our proposed method shows excellent statistical properties and is useful in the full range of ChIP-seq applications, especially with deeply sequenced data.

Show MeSH