Limits...
On Accounting for Sequence-Specific Bias in Genome-Wide Chromatin Accessibility Experiments: Recent Advances and Contradictions.

Madrigal P - Front Bioeng Biotechnol (2015)

View Article: PubMed Central - PubMed

Affiliation: Wellcome Trust Sanger Institute , Cambridge , UK ; Department of Surgery, University of Cambridge , Cambridge , UK.

AUTOMATICALLY GENERATED EXCERPT
Please rate it.

Alternative protocols assaying the chromatin landscape, such as those based on digestion by DNase I enzyme (DNase-seq), micrococcal nuclease (MNase-seq), and Tn5 transposase attack (ATAC-seq), enable the identification of DNA-binding protein footprints of many TFs in a single experiment (Tsompana and Buck, )... Despite the initial promise of detecting the majority of TFs in one assay, DNA sequence-specific biases, together with TF-dependent binding kinetics, have been recently pinpointed as major confounding factors in DNase-seq experiments (Koohy et al., ; He et al., ; Raj and McVicker, ; Rusk, ; Sung et al., )... These influencing factors were not considered by any of the previous computational approaches for the analysis of next-generation sequencing chromatin accessibility data (Madrigal and Krajewski, ); neither those strategies based on TF-generic DNase signature nor those based on TF-specific DNase signature (Luo and Hartemink, )... Remarkably, the authors found that sequence bias is DNase-seq protocol specific... They also found that the signature of a footprint could be formed by a mixture of DNase digestion profiles identified by unsupervised k-means clustering, in agreement with the observations found in an earlier study (Tewari et al., )... Consequently, using naked (deproteinized) DNA control datasets specific to a protocol and an enzyme as well as high sequencing depth (Hesselberth et al., ) are now suggested recommendations for DNase-seq experiments aiming to detect footprints (Meyer and Liu, )... A third approach, an improved version of HINT [HMM-based identification of TF footprints (Gusmao et al., )], named as HINT-BC/HINT-BCN (Bias Correction based on hypersensitivity sites/Bias Correction based on Naked DNase-seq) includes k-mer based bias correction in DNase-seq data as in He et al., leading to substantial changes in the average DNase I cleavage patterns surrounding the TFs... These changes result beneficial to footprinting method accuracy (personal communication with the author)... This finding clearly contradicts those of He et al. and Sung et al.... In msCentipede, the footprint signature (or cleavage profile) pattern within a factor-bound motif instance was, therefore, found to be informative when increasing the sensitivity and specificity of the TF binding site prediction... So far, a footprint of a TF, therefore, might be either detectable (and better detectable when accounting, or not, for influencing factors), or undetectable... In many studies, both problems are convoluted and addressed using the same “gold standard” datasets, such as ChIP-seq, which do not have nucleotide-level resolution... These issues also complicate data integration with TF ChIP-seq, as peaks without a footprint in DNase-seq/ATAC-seq, considered weak/indirect binding or false positives (ChIP artifacts), might instead be explained by a class of TFs with rapid kinetics.

No MeSH data available.


Tn5 transposase shows sequence cleavage bias. Data represented correspond to read-start sites in reads aligned to forward and reverse strands in chromosome 22 in four ATAC-seq replicates (50 k cells per replicate) reported in Buenrostro et al. (2013). Of total, 50 bp PE reads were pre-processed with Trimmomatic v0.32 under default parameters, and then aligned to hg19 using BWA v0.7.4-r385 (Li and Durbin, 2010; Bolger et al., 2014). Sequence logos were generated using WebLogo (Crooks et al., 2004). Y -axis: 0.0–0.3 bits.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4585268&req=5

Figure 1: Tn5 transposase shows sequence cleavage bias. Data represented correspond to read-start sites in reads aligned to forward and reverse strands in chromosome 22 in four ATAC-seq replicates (50 k cells per replicate) reported in Buenrostro et al. (2013). Of total, 50 bp PE reads were pre-processed with Trimmomatic v0.32 under default parameters, and then aligned to hg19 using BWA v0.7.4-r385 (Li and Durbin, 2010; Bolger et al., 2014). Sequence logos were generated using WebLogo (Crooks et al., 2004). Y -axis: 0.0–0.3 bits.

Mentions: It is unknown if ATAC-seq derived footprints are factor dependent or affected by Tn5 cleavage preferences (Tsompana and Buck, 2014). As expected, bioinformatic analysis of chromosome 22 in the published human datasets for 50,000 cells reveals sequence biases in ATAC-seq experiments (Buenrostro et al., 2013) (Figure 1), similar to those found by Koohy et al. (2013) in DNase-seq. As ATAC-seq might replace DNase-seq in the foreseeable future due to its cost and time efficiencies, and because it simultaneously allows the identification of nucleosome positions (Buenrostro et al., 2013), new computational models are necessary to evaluate intrinsic confounding factors in ATAC-seq.


On Accounting for Sequence-Specific Bias in Genome-Wide Chromatin Accessibility Experiments: Recent Advances and Contradictions.

Madrigal P - Front Bioeng Biotechnol (2015)

Tn5 transposase shows sequence cleavage bias. Data represented correspond to read-start sites in reads aligned to forward and reverse strands in chromosome 22 in four ATAC-seq replicates (50 k cells per replicate) reported in Buenrostro et al. (2013). Of total, 50 bp PE reads were pre-processed with Trimmomatic v0.32 under default parameters, and then aligned to hg19 using BWA v0.7.4-r385 (Li and Durbin, 2010; Bolger et al., 2014). Sequence logos were generated using WebLogo (Crooks et al., 2004). Y -axis: 0.0–0.3 bits.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4585268&req=5

Figure 1: Tn5 transposase shows sequence cleavage bias. Data represented correspond to read-start sites in reads aligned to forward and reverse strands in chromosome 22 in four ATAC-seq replicates (50 k cells per replicate) reported in Buenrostro et al. (2013). Of total, 50 bp PE reads were pre-processed with Trimmomatic v0.32 under default parameters, and then aligned to hg19 using BWA v0.7.4-r385 (Li and Durbin, 2010; Bolger et al., 2014). Sequence logos were generated using WebLogo (Crooks et al., 2004). Y -axis: 0.0–0.3 bits.
Mentions: It is unknown if ATAC-seq derived footprints are factor dependent or affected by Tn5 cleavage preferences (Tsompana and Buck, 2014). As expected, bioinformatic analysis of chromosome 22 in the published human datasets for 50,000 cells reveals sequence biases in ATAC-seq experiments (Buenrostro et al., 2013) (Figure 1), similar to those found by Koohy et al. (2013) in DNase-seq. As ATAC-seq might replace DNase-seq in the foreseeable future due to its cost and time efficiencies, and because it simultaneously allows the identification of nucleosome positions (Buenrostro et al., 2013), new computational models are necessary to evaluate intrinsic confounding factors in ATAC-seq.

View Article: PubMed Central - PubMed

Affiliation: Wellcome Trust Sanger Institute , Cambridge , UK ; Department of Surgery, University of Cambridge , Cambridge , UK.

AUTOMATICALLY GENERATED EXCERPT
Please rate it.

Alternative protocols assaying the chromatin landscape, such as those based on digestion by DNase I enzyme (DNase-seq), micrococcal nuclease (MNase-seq), and Tn5 transposase attack (ATAC-seq), enable the identification of DNA-binding protein footprints of many TFs in a single experiment (Tsompana and Buck, )... Despite the initial promise of detecting the majority of TFs in one assay, DNA sequence-specific biases, together with TF-dependent binding kinetics, have been recently pinpointed as major confounding factors in DNase-seq experiments (Koohy et al., ; He et al., ; Raj and McVicker, ; Rusk, ; Sung et al., )... These influencing factors were not considered by any of the previous computational approaches for the analysis of next-generation sequencing chromatin accessibility data (Madrigal and Krajewski, ); neither those strategies based on TF-generic DNase signature nor those based on TF-specific DNase signature (Luo and Hartemink, )... Remarkably, the authors found that sequence bias is DNase-seq protocol specific... They also found that the signature of a footprint could be formed by a mixture of DNase digestion profiles identified by unsupervised k-means clustering, in agreement with the observations found in an earlier study (Tewari et al., )... Consequently, using naked (deproteinized) DNA control datasets specific to a protocol and an enzyme as well as high sequencing depth (Hesselberth et al., ) are now suggested recommendations for DNase-seq experiments aiming to detect footprints (Meyer and Liu, )... A third approach, an improved version of HINT [HMM-based identification of TF footprints (Gusmao et al., )], named as HINT-BC/HINT-BCN (Bias Correction based on hypersensitivity sites/Bias Correction based on Naked DNase-seq) includes k-mer based bias correction in DNase-seq data as in He et al., leading to substantial changes in the average DNase I cleavage patterns surrounding the TFs... These changes result beneficial to footprinting method accuracy (personal communication with the author)... This finding clearly contradicts those of He et al. and Sung et al.... In msCentipede, the footprint signature (or cleavage profile) pattern within a factor-bound motif instance was, therefore, found to be informative when increasing the sensitivity and specificity of the TF binding site prediction... So far, a footprint of a TF, therefore, might be either detectable (and better detectable when accounting, or not, for influencing factors), or undetectable... In many studies, both problems are convoluted and addressed using the same “gold standard” datasets, such as ChIP-seq, which do not have nucleotide-level resolution... These issues also complicate data integration with TF ChIP-seq, as peaks without a footprint in DNase-seq/ATAC-seq, considered weak/indirect binding or false positives (ChIP artifacts), might instead be explained by a class of TFs with rapid kinetics.

No MeSH data available.