Limits...
Normalization of ChIP-seq data with control.

Liang K, Keleş S - BMC Bioinformatics (2012)

Bottom Line: ChIP-seq has become an important tool for identifying genome-wide protein-DNA interactions, including transcription factor binding and histone modifications.Proper normalization between the ChIP and control samples is an essential aspect of ChIP-seq data analysis.Our results indicate that the proper normalization between the ChIP and control samples is an important step in ChIP-seq analysis in terms of power and error rate control.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada. k22liang@uwaterloo.ca

ABSTRACT

Background: ChIP-seq has become an important tool for identifying genome-wide protein-DNA interactions, including transcription factor binding and histone modifications. In ChIP-seq experiments, ChIP samples are usually coupled with their matching control samples. Proper normalization between the ChIP and control samples is an essential aspect of ChIP-seq data analysis.

Results: We have developed a novel method for estimating the normalization factor between the ChIP and the control samples. Our method, named as NCIS (Normalization of ChIP-seq) can accommodate both low and high sequencing depth datasets. We compare statistical properties of NCIS against existing methods in a set of diverse simulation settings, where NCIS enjoys the best estimation precision. In addition, we illustrate the impact of the normalization factor in FDR control and show that NCIS leads to more power among methods that control FDR at nominal levels.

Conclusion: Our results indicate that the proper normalization between the ChIP and control samples is an important step in ChIP-seq analysis in terms of power and error rate control. Our proposed method shows excellent statistical properties and is useful in the full range of ChIP-seq applications, especially with deeply sequenced data.

Show MeSH
ChIP/control ratio as a function of total count for C.elegans data. (a) Marginal ChIP/control ratio against total count, both in log (10) scale, from a C.elegans ChIP-seq dataset of transcription factor PHA-4 [18]. Sizes of the plotting circles are proportional to log (10) of numbers of reads. Vertical dash line marks the total count selected by NCIS to estimate the normalization constant. Horizontal dash line marks the normalization factor estimate from NCIS. (b) Normalization constant as a function of bin-width. Vertical dash line marks the bin-width selected by NCIS to estimate the normalization constant. Horizontal dash line marks the normalization factor estimate from NCIS.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3475056&req=5

Figure 1: ChIP/control ratio as a function of total count for C.elegans data. (a) Marginal ChIP/control ratio against total count, both in log (10) scale, from a C.elegans ChIP-seq dataset of transcription factor PHA-4 [18]. Sizes of the plotting circles are proportional to log (10) of numbers of reads. Vertical dash line marks the total count selected by NCIS to estimate the normalization constant. Horizontal dash line marks the normalization factor estimate from NCIS. (b) Normalization constant as a function of bin-width. Vertical dash line marks the bin-width selected by NCIS to estimate the normalization constant. Horizontal dash line marks the normalization factor estimate from NCIS.

Mentions: We now motivate our method through a real ChIP-seq study. Figure 1a shows the marginal ChIP/control ratio () against the total count (t) with w = 500 bp for a C.elegans ChIP-seq dataset of transcription factor PHA-4 [18]. On the left half of the figure where t is small, the ratio estimates fall around a horizontal line, and the variability increases as t becomes small. This observation illustrates that the reads from the bins with small total counts are mostly from background regions and their marginal ChIP/control ratios are similar. On the right half of Figure 1a, there is a strong ascent of marginal ratios which indicates the significant infusion of enrichment signal reads into the ChIP reads.


Normalization of ChIP-seq data with control.

Liang K, Keleş S - BMC Bioinformatics (2012)

ChIP/control ratio as a function of total count for C.elegans data. (a) Marginal ChIP/control ratio against total count, both in log (10) scale, from a C.elegans ChIP-seq dataset of transcription factor PHA-4 [18]. Sizes of the plotting circles are proportional to log (10) of numbers of reads. Vertical dash line marks the total count selected by NCIS to estimate the normalization constant. Horizontal dash line marks the normalization factor estimate from NCIS. (b) Normalization constant as a function of bin-width. Vertical dash line marks the bin-width selected by NCIS to estimate the normalization constant. Horizontal dash line marks the normalization factor estimate from NCIS.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3475056&req=5

Figure 1: ChIP/control ratio as a function of total count for C.elegans data. (a) Marginal ChIP/control ratio against total count, both in log (10) scale, from a C.elegans ChIP-seq dataset of transcription factor PHA-4 [18]. Sizes of the plotting circles are proportional to log (10) of numbers of reads. Vertical dash line marks the total count selected by NCIS to estimate the normalization constant. Horizontal dash line marks the normalization factor estimate from NCIS. (b) Normalization constant as a function of bin-width. Vertical dash line marks the bin-width selected by NCIS to estimate the normalization constant. Horizontal dash line marks the normalization factor estimate from NCIS.
Mentions: We now motivate our method through a real ChIP-seq study. Figure 1a shows the marginal ChIP/control ratio () against the total count (t) with w = 500 bp for a C.elegans ChIP-seq dataset of transcription factor PHA-4 [18]. On the left half of the figure where t is small, the ratio estimates fall around a horizontal line, and the variability increases as t becomes small. This observation illustrates that the reads from the bins with small total counts are mostly from background regions and their marginal ChIP/control ratios are similar. On the right half of Figure 1a, there is a strong ascent of marginal ratios which indicates the significant infusion of enrichment signal reads into the ChIP reads.

Bottom Line: ChIP-seq has become an important tool for identifying genome-wide protein-DNA interactions, including transcription factor binding and histone modifications.Proper normalization between the ChIP and control samples is an essential aspect of ChIP-seq data analysis.Our results indicate that the proper normalization between the ChIP and control samples is an important step in ChIP-seq analysis in terms of power and error rate control.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada. k22liang@uwaterloo.ca

ABSTRACT

Background: ChIP-seq has become an important tool for identifying genome-wide protein-DNA interactions, including transcription factor binding and histone modifications. In ChIP-seq experiments, ChIP samples are usually coupled with their matching control samples. Proper normalization between the ChIP and control samples is an essential aspect of ChIP-seq data analysis.

Results: We have developed a novel method for estimating the normalization factor between the ChIP and the control samples. Our method, named as NCIS (Normalization of ChIP-seq) can accommodate both low and high sequencing depth datasets. We compare statistical properties of NCIS against existing methods in a set of diverse simulation settings, where NCIS enjoys the best estimation precision. In addition, we illustrate the impact of the normalization factor in FDR control and show that NCIS leads to more power among methods that control FDR at nominal levels.

Conclusion: Our results indicate that the proper normalization between the ChIP and control samples is an important step in ChIP-seq analysis in terms of power and error rate control. Our proposed method shows excellent statistical properties and is useful in the full range of ChIP-seq applications, especially with deeply sequenced data.

Show MeSH