Limits...
NEXT-peak: a normal-exponential two-peak model for peak-calling in ChIP-seq data.

Kim NK, Jayatillake RV, Spouge JL - BMC Genomics (2013)

Bottom Line: The model therefore estimates total strength of binding (even if some binding locations do not map uniquely into a reference genome, effectively censoring them); it also assigns an error to an estimated binding location.The model also provides a goodness-of-fit test, to screen out spurious peaks and to infer multiple binding events in a region.NEXT-peak is based on rigorous statistics, so its model also provides a principled foundation for a more elaborate statistical analysis of ChIP-seq data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Mathematics and Statistics Department, Old Dominion University, Norfolk, VA 23529, USA. nxkim@odu.edu

ABSTRACT

Background: Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) can locate transcription factor binding sites on genomic scale. Although many models and programs are available to call peaks, none has dominated its competition in comparison studies.

Results: We propose a rigorous statistical model, the normal-exponential two-peak (NEXT-peak) model, which parallels the physical processes generating the empirical data, and which can naturally incorporate mappability information. The model therefore estimates total strength of binding (even if some binding locations do not map uniquely into a reference genome, effectively censoring them); it also assigns an error to an estimated binding location. The comparison study with existing programs on real ChIP-seq datasets (STAT1, NRSF, and ZNF143) demonstrates that the NEXT-peak model performs well both in calling peaks and locating them. The model also provides a goodness-of-fit test, to screen out spurious peaks and to infer multiple binding events in a region.

Conclusions: The NEXT-peak program calls peaks on any test dataset about as accurately as any other, but provides unusual accuracy in the estimated location of the peaks it calls. NEXT-peak is based on rigorous statistics, so its model also provides a principled foundation for a more elaborate statistical analysis of ChIP-seq data.

Show MeSH

Related in: MedlinePlus

Profile of the normal-exponential two-peak (NEXT-peak) density. (a) An example of NEXT-peak density profile without fitting to a particular dataset. The blue curve is for a tag profile on the left (positive) strand, the red curve is for the right (negative) strand. Parameter values are β = 60, and σ = 40 (see Methods). The two density curves mirror each other around the center location. (b) Tag profile of STAT1 ChIP-seq data. From the motif search, thousands of motif sites were found. The cumulative tag counts were rescaled and displayed as densities. (c) NRSF. (d) ZNF143. Table 2 reports estimated values  and  for each dataset.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3672025&req=5

Figure 1: Profile of the normal-exponential two-peak (NEXT-peak) density. (a) An example of NEXT-peak density profile without fitting to a particular dataset. The blue curve is for a tag profile on the left (positive) strand, the red curve is for the right (negative) strand. Parameter values are β = 60, and σ = 40 (see Methods). The two density curves mirror each other around the center location. (b) Tag profile of STAT1 ChIP-seq data. From the motif search, thousands of motif sites were found. The cumulative tag counts were rescaled and displayed as densities. (c) NRSF. (d) ZNF143. Table 2 reports estimated values and for each dataset.

Mentions: Searches with position-specific scoring matrices from JASPAR [16] yielded candidates for actual STAT1, NRSF, or ZNF143 sites within each region with a binding event. The searches used the p-value cut-off 5×10-6 for all datasets. See Methods for details on the p-value computation for finding motif sites. Figure 1a shows a density of the normal-exponential two-peak (NEXT-peak) model (see Methods). Figure 1b-d displays the tag number, normalized to a probability density, for each location around the position of the candidate sites. The observed tag density is superimposed on the estimated density (derived from model estimates of the expected tag numbers, or in the Methods). Maximum likelihood estimation on the NEXT-peak model produced parameter estimates underlying and . Table 2 reports estimated parameter values for each dataset.


NEXT-peak: a normal-exponential two-peak model for peak-calling in ChIP-seq data.

Kim NK, Jayatillake RV, Spouge JL - BMC Genomics (2013)

Profile of the normal-exponential two-peak (NEXT-peak) density. (a) An example of NEXT-peak density profile without fitting to a particular dataset. The blue curve is for a tag profile on the left (positive) strand, the red curve is for the right (negative) strand. Parameter values are β = 60, and σ = 40 (see Methods). The two density curves mirror each other around the center location. (b) Tag profile of STAT1 ChIP-seq data. From the motif search, thousands of motif sites were found. The cumulative tag counts were rescaled and displayed as densities. (c) NRSF. (d) ZNF143. Table 2 reports estimated values  and  for each dataset.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3672025&req=5

Figure 1: Profile of the normal-exponential two-peak (NEXT-peak) density. (a) An example of NEXT-peak density profile without fitting to a particular dataset. The blue curve is for a tag profile on the left (positive) strand, the red curve is for the right (negative) strand. Parameter values are β = 60, and σ = 40 (see Methods). The two density curves mirror each other around the center location. (b) Tag profile of STAT1 ChIP-seq data. From the motif search, thousands of motif sites were found. The cumulative tag counts were rescaled and displayed as densities. (c) NRSF. (d) ZNF143. Table 2 reports estimated values and for each dataset.
Mentions: Searches with position-specific scoring matrices from JASPAR [16] yielded candidates for actual STAT1, NRSF, or ZNF143 sites within each region with a binding event. The searches used the p-value cut-off 5×10-6 for all datasets. See Methods for details on the p-value computation for finding motif sites. Figure 1a shows a density of the normal-exponential two-peak (NEXT-peak) model (see Methods). Figure 1b-d displays the tag number, normalized to a probability density, for each location around the position of the candidate sites. The observed tag density is superimposed on the estimated density (derived from model estimates of the expected tag numbers, or in the Methods). Maximum likelihood estimation on the NEXT-peak model produced parameter estimates underlying and . Table 2 reports estimated parameter values for each dataset.

Bottom Line: The model therefore estimates total strength of binding (even if some binding locations do not map uniquely into a reference genome, effectively censoring them); it also assigns an error to an estimated binding location.The model also provides a goodness-of-fit test, to screen out spurious peaks and to infer multiple binding events in a region.NEXT-peak is based on rigorous statistics, so its model also provides a principled foundation for a more elaborate statistical analysis of ChIP-seq data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Mathematics and Statistics Department, Old Dominion University, Norfolk, VA 23529, USA. nxkim@odu.edu

ABSTRACT

Background: Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) can locate transcription factor binding sites on genomic scale. Although many models and programs are available to call peaks, none has dominated its competition in comparison studies.

Results: We propose a rigorous statistical model, the normal-exponential two-peak (NEXT-peak) model, which parallels the physical processes generating the empirical data, and which can naturally incorporate mappability information. The model therefore estimates total strength of binding (even if some binding locations do not map uniquely into a reference genome, effectively censoring them); it also assigns an error to an estimated binding location. The comparison study with existing programs on real ChIP-seq datasets (STAT1, NRSF, and ZNF143) demonstrates that the NEXT-peak model performs well both in calling peaks and locating them. The model also provides a goodness-of-fit test, to screen out spurious peaks and to infer multiple binding events in a region.

Conclusions: The NEXT-peak program calls peaks on any test dataset about as accurately as any other, but provides unusual accuracy in the estimated location of the peaks it calls. NEXT-peak is based on rigorous statistics, so its model also provides a principled foundation for a more elaborate statistical analysis of ChIP-seq data.

Show MeSH
Related in: MedlinePlus