Explicit DNase sequence bias modeling enables high-resolution transcription factor footprint detection.
Bottom Line: DNase-seq footprints were absent under a fraction of ChIP-seq peaks, which we show to be indicative of weaker binding, indirect TF-DNA interactions or possible ChIP artifacts.The modeling approach was also able to detect variation in the consensus motifs that TFs bind to.Finally, cell type specific footprints were detected within DNase hypersensitive sites that are present in multiple cell types, further supporting that footprints can identify changes in TF binding that are not detectable using other strategies.
Affiliation: Computational Biology and Bioinformatics Program, Duke University, Durham, NC 27708, USA Center for Genomic and Computational Biology, Duke University, Durham, NC 27708, USA.Show MeSH
Related in: MedlinePlus
Mentions: At least four different scenarios arise when searching for bona fide DNase footprints at candidate TF-binding sites, which we define as DNA sequences that match the sequence preferences of a specific TF (i.e. a sequence motif match). We provide examples of these scenarios using NRSF ChIP-seq and DNase-seq data from the GM12878 lymphoblastoid cell line (37,38). First, true positives are sequence motif matches that overlap both a DNase footprint and a ChIP-seq peak for a TF associated with the sequence motif. These are highly likely to represent direct binding sites (Figure 1A). Second, true negatives are sequence motif matches without a DNase-seq footprint that do not map in a ChIP-seq peak (Figure 1B). Third, ChIP may not have the resolution to tell apart which one of two sequence motif matches is indeed bound, but this may be resolved by the presence of a footprint (Figure 1C). Fourth, sequence motif matches that overlap ChIP-seq peaks but do not exhibit a DNase-seq footprint (Figure 1D) may represent weak or indirect binding of TFs, long-range chromatin looping (39) or simply artifacts due to false-positive ChIP-seq peak calls (40,41). Together, these scenarios illustrate the challenges of identifying footprints and the motivation behind our modeling approach.
Affiliation: Computational Biology and Bioinformatics Program, Duke University, Durham, NC 27708, USA Center for Genomic and Computational Biology, Duke University, Durham, NC 27708, USA.