Limits...
Normalization of ChIP-seq data with control.

Liang K, Keleş S - BMC Bioinformatics (2012)

Bottom Line: ChIP-seq has become an important tool for identifying genome-wide protein-DNA interactions, including transcription factor binding and histone modifications.Proper normalization between the ChIP and control samples is an essential aspect of ChIP-seq data analysis.Our results indicate that the proper normalization between the ChIP and control samples is an important step in ChIP-seq analysis in terms of power and error rate control.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada. k22liang@uwaterloo.ca

ABSTRACT

Background: ChIP-seq has become an important tool for identifying genome-wide protein-DNA interactions, including transcription factor binding and histone modifications. In ChIP-seq experiments, ChIP samples are usually coupled with their matching control samples. Proper normalization between the ChIP and control samples is an essential aspect of ChIP-seq data analysis.

Results: We have developed a novel method for estimating the normalization factor between the ChIP and the control samples. Our method, named as NCIS (Normalization of ChIP-seq) can accommodate both low and high sequencing depth datasets. We compare statistical properties of NCIS against existing methods in a set of diverse simulation settings, where NCIS enjoys the best estimation precision. In addition, we illustrate the impact of the normalization factor in FDR control and show that NCIS leads to more power among methods that control FDR at nominal levels.

Conclusion: Our results indicate that the proper normalization between the ChIP and control samples is an important step in ChIP-seq analysis in terms of power and error rate control. Our proposed method shows excellent statistical properties and is useful in the full range of ChIP-seq applications, especially with deeply sequenced data.

Show MeSH
Statistical properties of normalization factor estimators. Mean and MSE (log10) for estimating the normalization factor in simulation setting 1 (left), setting 2 (middle) and setting 3 (right) with c = 1. The true value of the normalization factor is 1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3475056&req=5

Figure 2: Statistical properties of normalization factor estimators. Mean and MSE (log10) for estimating the normalization factor in simulation setting 1 (left), setting 2 (middle) and setting 3 (right) with c = 1. The true value of the normalization factor is 1.

Mentions: In this simulation study, we compare our estimator (NCIS) with estimators proposed in CisGenome, SPP, CCAT, and PeakSeq. The exclusion proportion parameter Pf in PeakSeq was set at 0 to simplify its computation. Left panel of Figure 2 displays the log (10) of mean squared error (MSE) for setting 1 (transcription factor binding) with c = 1. We chose MSE as our comparison metric because it considers both the bias and the variance of the estimators compared to true normalization factor. Overall, our NCIS estimator has the smallest MSE among all the methods. PeakSeq estimator is the worst in estimation precision, followed by SPP. CisGenome estimator has second best MSE when sequencing depth is low; however its performance deteriorates when sequencing depth is high. The performances of all the estimators except PeakSeq and CisGenome improve with the increase of sequencing depth. The rest of the results (c = 0.2 and 0.5) for setting 1 are similar and are provided in [Additional file 1: Figure S6 and S7].


Normalization of ChIP-seq data with control.

Liang K, Keleş S - BMC Bioinformatics (2012)

Statistical properties of normalization factor estimators. Mean and MSE (log10) for estimating the normalization factor in simulation setting 1 (left), setting 2 (middle) and setting 3 (right) with c = 1. The true value of the normalization factor is 1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3475056&req=5

Figure 2: Statistical properties of normalization factor estimators. Mean and MSE (log10) for estimating the normalization factor in simulation setting 1 (left), setting 2 (middle) and setting 3 (right) with c = 1. The true value of the normalization factor is 1.
Mentions: In this simulation study, we compare our estimator (NCIS) with estimators proposed in CisGenome, SPP, CCAT, and PeakSeq. The exclusion proportion parameter Pf in PeakSeq was set at 0 to simplify its computation. Left panel of Figure 2 displays the log (10) of mean squared error (MSE) for setting 1 (transcription factor binding) with c = 1. We chose MSE as our comparison metric because it considers both the bias and the variance of the estimators compared to true normalization factor. Overall, our NCIS estimator has the smallest MSE among all the methods. PeakSeq estimator is the worst in estimation precision, followed by SPP. CisGenome estimator has second best MSE when sequencing depth is low; however its performance deteriorates when sequencing depth is high. The performances of all the estimators except PeakSeq and CisGenome improve with the increase of sequencing depth. The rest of the results (c = 0.2 and 0.5) for setting 1 are similar and are provided in [Additional file 1: Figure S6 and S7].

Bottom Line: ChIP-seq has become an important tool for identifying genome-wide protein-DNA interactions, including transcription factor binding and histone modifications.Proper normalization between the ChIP and control samples is an essential aspect of ChIP-seq data analysis.Our results indicate that the proper normalization between the ChIP and control samples is an important step in ChIP-seq analysis in terms of power and error rate control.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada. k22liang@uwaterloo.ca

ABSTRACT

Background: ChIP-seq has become an important tool for identifying genome-wide protein-DNA interactions, including transcription factor binding and histone modifications. In ChIP-seq experiments, ChIP samples are usually coupled with their matching control samples. Proper normalization between the ChIP and control samples is an essential aspect of ChIP-seq data analysis.

Results: We have developed a novel method for estimating the normalization factor between the ChIP and the control samples. Our method, named as NCIS (Normalization of ChIP-seq) can accommodate both low and high sequencing depth datasets. We compare statistical properties of NCIS against existing methods in a set of diverse simulation settings, where NCIS enjoys the best estimation precision. In addition, we illustrate the impact of the normalization factor in FDR control and show that NCIS leads to more power among methods that control FDR at nominal levels.

Conclusion: Our results indicate that the proper normalization between the ChIP and control samples is an important step in ChIP-seq analysis in terms of power and error rate control. Our proposed method shows excellent statistical properties and is useful in the full range of ChIP-seq applications, especially with deeply sequenced data.

Show MeSH