Limits...
Normalization of ChIP-seq data with control.

Liang K, Keleş S - BMC Bioinformatics (2012)

Bottom Line: ChIP-seq has become an important tool for identifying genome-wide protein-DNA interactions, including transcription factor binding and histone modifications.Proper normalization between the ChIP and control samples is an essential aspect of ChIP-seq data analysis.Our results indicate that the proper normalization between the ChIP and control samples is an important step in ChIP-seq analysis in terms of power and error rate control.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada. k22liang@uwaterloo.ca

ABSTRACT

Background: ChIP-seq has become an important tool for identifying genome-wide protein-DNA interactions, including transcription factor binding and histone modifications. In ChIP-seq experiments, ChIP samples are usually coupled with their matching control samples. Proper normalization between the ChIP and control samples is an essential aspect of ChIP-seq data analysis.

Results: We have developed a novel method for estimating the normalization factor between the ChIP and the control samples. Our method, named as NCIS (Normalization of ChIP-seq) can accommodate both low and high sequencing depth datasets. We compare statistical properties of NCIS against existing methods in a set of diverse simulation settings, where NCIS enjoys the best estimation precision. In addition, we illustrate the impact of the normalization factor in FDR control and show that NCIS leads to more power among methods that control FDR at nominal levels.

Conclusion: Our results indicate that the proper normalization between the ChIP and control samples is an important step in ChIP-seq analysis in terms of power and error rate control. Our proposed method shows excellent statistical properties and is useful in the full range of ChIP-seq applications, especially with deeply sequenced data.

Show MeSH
FDR control and power. FDR control with the sample-swapping method. (a) compares FDR levels with different normalization factor estimators. (b) Power comparison between between FDR control at 0.05 level with different normalization factor estimators.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3475056&req=5

Figure 3: FDR control and power. FDR control with the sample-swapping method. (a) compares FDR levels with different normalization factor estimators. (b) Power comparison between between FDR control at 0.05 level with different normalization factor estimators.

Mentions: As a comparison, we also performed peak calling when the normalization factor is set to its true value of 1 and refer to this method as the Oracle. FDR is controlled at the target level of 0.05, and Figure 3a displays the means of the realized FDR for various methods. The FDR values of the Oracle are close to the nominal value of 0.05 (the median of the differences is 0.002 while the median of the standard errors is 0.0012). For display purpose, the Oracle FDR values are plotted at the expected value of 0.05, and other methods are adjusted accordingly. CisGenome and CCAT fail to control FDR at various sequencing depths, especially when the sequencing depth is high. CisGenome’s FDR values can be drastically larger than nominal level at high sequencing depths because its normalization estimate becomes unreliable and highly variable. CCAT’s failure to control FDR is due to the negative bias resulting from the artifacts. Among all the methods, the FDR values of NCIS are the closest to the Oracle. Figure 3b shows the power (number of true positive) of all methods against different subsampling divisors/sequencing depths. Among all methods that can control FDR at the nominal level (NCIS, SPP and PeakSeq), NCIS is the most powerful method and is indistinguishable from the Oracle. On average, NCIS is about 6% more powerful than the second best (SPP) across different sequencing depths.


Normalization of ChIP-seq data with control.

Liang K, Keleş S - BMC Bioinformatics (2012)

FDR control and power. FDR control with the sample-swapping method. (a) compares FDR levels with different normalization factor estimators. (b) Power comparison between between FDR control at 0.05 level with different normalization factor estimators.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3475056&req=5

Figure 3: FDR control and power. FDR control with the sample-swapping method. (a) compares FDR levels with different normalization factor estimators. (b) Power comparison between between FDR control at 0.05 level with different normalization factor estimators.
Mentions: As a comparison, we also performed peak calling when the normalization factor is set to its true value of 1 and refer to this method as the Oracle. FDR is controlled at the target level of 0.05, and Figure 3a displays the means of the realized FDR for various methods. The FDR values of the Oracle are close to the nominal value of 0.05 (the median of the differences is 0.002 while the median of the standard errors is 0.0012). For display purpose, the Oracle FDR values are plotted at the expected value of 0.05, and other methods are adjusted accordingly. CisGenome and CCAT fail to control FDR at various sequencing depths, especially when the sequencing depth is high. CisGenome’s FDR values can be drastically larger than nominal level at high sequencing depths because its normalization estimate becomes unreliable and highly variable. CCAT’s failure to control FDR is due to the negative bias resulting from the artifacts. Among all the methods, the FDR values of NCIS are the closest to the Oracle. Figure 3b shows the power (number of true positive) of all methods against different subsampling divisors/sequencing depths. Among all methods that can control FDR at the nominal level (NCIS, SPP and PeakSeq), NCIS is the most powerful method and is indistinguishable from the Oracle. On average, NCIS is about 6% more powerful than the second best (SPP) across different sequencing depths.

Bottom Line: ChIP-seq has become an important tool for identifying genome-wide protein-DNA interactions, including transcription factor binding and histone modifications.Proper normalization between the ChIP and control samples is an essential aspect of ChIP-seq data analysis.Our results indicate that the proper normalization between the ChIP and control samples is an important step in ChIP-seq analysis in terms of power and error rate control.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada. k22liang@uwaterloo.ca

ABSTRACT

Background: ChIP-seq has become an important tool for identifying genome-wide protein-DNA interactions, including transcription factor binding and histone modifications. In ChIP-seq experiments, ChIP samples are usually coupled with their matching control samples. Proper normalization between the ChIP and control samples is an essential aspect of ChIP-seq data analysis.

Results: We have developed a novel method for estimating the normalization factor between the ChIP and the control samples. Our method, named as NCIS (Normalization of ChIP-seq) can accommodate both low and high sequencing depth datasets. We compare statistical properties of NCIS against existing methods in a set of diverse simulation settings, where NCIS enjoys the best estimation precision. In addition, we illustrate the impact of the normalization factor in FDR control and show that NCIS leads to more power among methods that control FDR at nominal levels.

Conclusion: Our results indicate that the proper normalization between the ChIP and control samples is an important step in ChIP-seq analysis in terms of power and error rate control. Our proposed method shows excellent statistical properties and is useful in the full range of ChIP-seq applications, especially with deeply sequenced data.

Show MeSH