Explicit DNase sequence bias modeling enables high-resolution transcription factor footprint detection.
Bottom Line: DNase-seq footprints were absent under a fraction of ChIP-seq peaks, which we show to be indicative of weaker binding, indirect TF-DNA interactions or possible ChIP artifacts.The modeling approach was also able to detect variation in the consensus motifs that TFs bind to.Finally, cell type specific footprints were detected within DNase hypersensitive sites that are present in multiple cell types, further supporting that footprints can identify changes in TF binding that are not detectable using other strategies.
Affiliation: Computational Biology and Bioinformatics Program, Duke University, Durham, NC 27708, USA Center for Genomic and Computational Biology, Duke University, Durham, NC 27708, USA.Show MeSH
Related in: MedlinePlus
Mentions: The observations described thus far motivated us to develop TF-specific footprint models to assess the predictive accuracy of DNase-seq footprinting alone. These models also take DNase cleavage bias into account. Specifically, this method would not include additional chromatin accessibility or genomic features that may indicate the general presence of a regulatory region (Figure 4). Following the footprint modeling strategy used in earlier approaches such as CENTIPEDE (16), our models sought to reflect the relative propensity of DNase-I cleavage at each position around sequence motif matches using factor-specific multinomial distributions. In addition, to accurately quantify the extent of DNase sequence bias, we incorporated a separate nonuniform background model that accounts for variability in signal profiles in the absence of functional footprints at candidate binding sites. This intrinsic sequence bias background model is estimated from the DNase-seq profile that would result from the selected DNA sequences alone, i.e. the relative cleavage propensity of each 6-mer surrounding the sequence motif matches (Figure 3A).
Affiliation: Computational Biology and Bioinformatics Program, Duke University, Durham, NC 27708, USA Center for Genomic and Computational Biology, Duke University, Durham, NC 27708, USA.