Limits...
NEXT-peak: a normal-exponential two-peak model for peak-calling in ChIP-seq data.

Kim NK, Jayatillake RV, Spouge JL - BMC Genomics (2013)

Bottom Line: The model therefore estimates total strength of binding (even if some binding locations do not map uniquely into a reference genome, effectively censoring them); it also assigns an error to an estimated binding location.The model also provides a goodness-of-fit test, to screen out spurious peaks and to infer multiple binding events in a region.NEXT-peak is based on rigorous statistics, so its model also provides a principled foundation for a more elaborate statistical analysis of ChIP-seq data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Mathematics and Statistics Department, Old Dominion University, Norfolk, VA 23529, USA. nxkim@odu.edu

ABSTRACT

Background: Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) can locate transcription factor binding sites on genomic scale. Although many models and programs are available to call peaks, none has dominated its competition in comparison studies.

Results: We propose a rigorous statistical model, the normal-exponential two-peak (NEXT-peak) model, which parallels the physical processes generating the empirical data, and which can naturally incorporate mappability information. The model therefore estimates total strength of binding (even if some binding locations do not map uniquely into a reference genome, effectively censoring them); it also assigns an error to an estimated binding location. The comparison study with existing programs on real ChIP-seq datasets (STAT1, NRSF, and ZNF143) demonstrates that the NEXT-peak model performs well both in calling peaks and locating them. The model also provides a goodness-of-fit test, to screen out spurious peaks and to infer multiple binding events in a region.

Conclusions: The NEXT-peak program calls peaks on any test dataset about as accurately as any other, but provides unusual accuracy in the estimated location of the peaks it calls. NEXT-peak is based on rigorous statistics, so its model also provides a principled foundation for a more elaborate statistical analysis of ChIP-seq data.

Show MeSH

Related in: MedlinePlus

A plot of regions with large number of unmappable locations from STAT1 ChIP-seq. Tag counts in the left strand are shown as blue bars, tag counts in the right strand, as red bars. The unmappable locations are marked by grey blocks. The circles represent motif sites; the triangles, estimated sites. (a) 49% locations are unmappable. (b) 41% unmappable. (c) 38% unmappable.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3672025&req=5

Figure 2: A plot of regions with large number of unmappable locations from STAT1 ChIP-seq. Tag counts in the left strand are shown as blue bars, tag counts in the right strand, as red bars. The unmappable locations are marked by grey blocks. The circles represent motif sites; the triangles, estimated sites. (a) 49% locations are unmappable. (b) 41% unmappable. (c) 38% unmappable.

Mentions: Figure 2 shows three regions with large number of unmappable locations from STAT1 data. In Figure 2, unmappable locations are marked by grey blocks. In Figure 2a, 49% of locations are unmappable; in Figure 2b, 41%; and in Figure 2c, 38%. The circles indicate motif sites; the triangles, estimated sites from the NEXT-peak model. The estimated sites approximate the motif sites reasonably well. The estimated tag counts due to the binding event are 636.8, 264.3, and 699.3; the total observed tag counts in the region are 396, 209, and 492. Although tags at unmappable locations are not observable, the NEXT-peak model increases the corresponding estimated tag counts to compensate. The compensation permits NEXT-peak to sharpen estimates of binding strength.


NEXT-peak: a normal-exponential two-peak model for peak-calling in ChIP-seq data.

Kim NK, Jayatillake RV, Spouge JL - BMC Genomics (2013)

A plot of regions with large number of unmappable locations from STAT1 ChIP-seq. Tag counts in the left strand are shown as blue bars, tag counts in the right strand, as red bars. The unmappable locations are marked by grey blocks. The circles represent motif sites; the triangles, estimated sites. (a) 49% locations are unmappable. (b) 41% unmappable. (c) 38% unmappable.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3672025&req=5

Figure 2: A plot of regions with large number of unmappable locations from STAT1 ChIP-seq. Tag counts in the left strand are shown as blue bars, tag counts in the right strand, as red bars. The unmappable locations are marked by grey blocks. The circles represent motif sites; the triangles, estimated sites. (a) 49% locations are unmappable. (b) 41% unmappable. (c) 38% unmappable.
Mentions: Figure 2 shows three regions with large number of unmappable locations from STAT1 data. In Figure 2, unmappable locations are marked by grey blocks. In Figure 2a, 49% of locations are unmappable; in Figure 2b, 41%; and in Figure 2c, 38%. The circles indicate motif sites; the triangles, estimated sites from the NEXT-peak model. The estimated sites approximate the motif sites reasonably well. The estimated tag counts due to the binding event are 636.8, 264.3, and 699.3; the total observed tag counts in the region are 396, 209, and 492. Although tags at unmappable locations are not observable, the NEXT-peak model increases the corresponding estimated tag counts to compensate. The compensation permits NEXT-peak to sharpen estimates of binding strength.

Bottom Line: The model therefore estimates total strength of binding (even if some binding locations do not map uniquely into a reference genome, effectively censoring them); it also assigns an error to an estimated binding location.The model also provides a goodness-of-fit test, to screen out spurious peaks and to infer multiple binding events in a region.NEXT-peak is based on rigorous statistics, so its model also provides a principled foundation for a more elaborate statistical analysis of ChIP-seq data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Mathematics and Statistics Department, Old Dominion University, Norfolk, VA 23529, USA. nxkim@odu.edu

ABSTRACT

Background: Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) can locate transcription factor binding sites on genomic scale. Although many models and programs are available to call peaks, none has dominated its competition in comparison studies.

Results: We propose a rigorous statistical model, the normal-exponential two-peak (NEXT-peak) model, which parallels the physical processes generating the empirical data, and which can naturally incorporate mappability information. The model therefore estimates total strength of binding (even if some binding locations do not map uniquely into a reference genome, effectively censoring them); it also assigns an error to an estimated binding location. The comparison study with existing programs on real ChIP-seq datasets (STAT1, NRSF, and ZNF143) demonstrates that the NEXT-peak model performs well both in calling peaks and locating them. The model also provides a goodness-of-fit test, to screen out spurious peaks and to infer multiple binding events in a region.

Conclusions: The NEXT-peak program calls peaks on any test dataset about as accurately as any other, but provides unusual accuracy in the estimated location of the peaks it calls. NEXT-peak is based on rigorous statistics, so its model also provides a principled foundation for a more elaborate statistical analysis of ChIP-seq data.

Show MeSH
Related in: MedlinePlus