Limits...
Design and analysis of ChIP-seq experiments for DNA-binding proteins.

Kharchenko PV, Tolstorukov MY, Park PJ - Nat. Biotechnol. (2008)

Bottom Line: To fill this gap, we propose an analysis pipeline specifically designed to detect protein-binding positions with high accuracy.Using previously reported data sets for three transcription factors, we illustrate methods for improving tag alignment and correcting for background signals.We also analyze the relationship between the depth of sequencing and characteristics of the detected binding positions, and provide a method for estimating the sequencing depth necessary for a desired coverage of protein binding sites.

View Article: PubMed Central - PubMed

Affiliation: Center for Biomedical Informatics, Harvard Medical School, 10 Shattuck St., Boston, Massachusetts 02115, USA.

ABSTRACT
Recent progress in massively parallel sequencing platforms has enabled genome-wide characterization of DNA-associated proteins using the combination of chromatin immunoprecipitation and sequencing (ChIP-seq). Although a variety of methods exist for analysis of the established alternative ChIP microarray (ChIP-chip), few approaches have been described for processing ChIP-seq data. To fill this gap, we propose an analysis pipeline specifically designed to detect protein-binding positions with high accuracy. Using previously reported data sets for three transcription factors, we illustrate methods for improving tag alignment and correcting for background signals. We compare the sensitivity and spatial precision of three peak detection algorithms with published methods, demonstrating gains in spatial precision when an asymmetric distribution of tags on positive and negative strands is considered. We also analyze the relationship between the depth of sequencing and characteristics of the detected binding positions, and provide a method for estimating the sequencing depth necessary for a desired coverage of protein binding sites.

Show MeSH

Related in: MedlinePlus

Examples of anomalies in background tag distributionsa. Singular positions with extremely high tag count. b. Larger, non-uniform regions of increased background tag density. c. Background tag density patterns resembling true protein binding positions. Each plot shows density of tags from ChIP and input samples. The tag histograms give combined tag counts.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2597701&req=5

Figure 3: Examples of anomalies in background tag distributionsa. Singular positions with extremely high tag count. b. Larger, non-uniform regions of increased background tag density. c. Background tag density patterns resembling true protein binding positions. Each plot shows density of tags from ChIP and input samples. The tag histograms give combined tag counts.

Mentions: Examining the input tag density, we find three major types of background anomalies. The first type results in singular peaks of tag density at a single chromosome position many orders of magnitude higher than the surrounding density (Figure 3a). Such peaks commonly occur at the same position on both chromosome strands. The second type of anomaly results in non-uniform, wide (>1000bp) clusters of increased tag density appearing on one or both strands (Figure 3b). The third type exhibits small clusters of strand-specific tag density resembling the pattern expected from a stable protein binding position, although it typically shows smaller separation between strand peaks (Figure 3c). A similar set of anomalies can be observed in the input sequencing of other organisms (data not shown).


Design and analysis of ChIP-seq experiments for DNA-binding proteins.

Kharchenko PV, Tolstorukov MY, Park PJ - Nat. Biotechnol. (2008)

Examples of anomalies in background tag distributionsa. Singular positions with extremely high tag count. b. Larger, non-uniform regions of increased background tag density. c. Background tag density patterns resembling true protein binding positions. Each plot shows density of tags from ChIP and input samples. The tag histograms give combined tag counts.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2597701&req=5

Figure 3: Examples of anomalies in background tag distributionsa. Singular positions with extremely high tag count. b. Larger, non-uniform regions of increased background tag density. c. Background tag density patterns resembling true protein binding positions. Each plot shows density of tags from ChIP and input samples. The tag histograms give combined tag counts.
Mentions: Examining the input tag density, we find three major types of background anomalies. The first type results in singular peaks of tag density at a single chromosome position many orders of magnitude higher than the surrounding density (Figure 3a). Such peaks commonly occur at the same position on both chromosome strands. The second type of anomaly results in non-uniform, wide (>1000bp) clusters of increased tag density appearing on one or both strands (Figure 3b). The third type exhibits small clusters of strand-specific tag density resembling the pattern expected from a stable protein binding position, although it typically shows smaller separation between strand peaks (Figure 3c). A similar set of anomalies can be observed in the input sequencing of other organisms (data not shown).

Bottom Line: To fill this gap, we propose an analysis pipeline specifically designed to detect protein-binding positions with high accuracy.Using previously reported data sets for three transcription factors, we illustrate methods for improving tag alignment and correcting for background signals.We also analyze the relationship between the depth of sequencing and characteristics of the detected binding positions, and provide a method for estimating the sequencing depth necessary for a desired coverage of protein binding sites.

View Article: PubMed Central - PubMed

Affiliation: Center for Biomedical Informatics, Harvard Medical School, 10 Shattuck St., Boston, Massachusetts 02115, USA.

ABSTRACT
Recent progress in massively parallel sequencing platforms has enabled genome-wide characterization of DNA-associated proteins using the combination of chromatin immunoprecipitation and sequencing (ChIP-seq). Although a variety of methods exist for analysis of the established alternative ChIP microarray (ChIP-chip), few approaches have been described for processing ChIP-seq data. To fill this gap, we propose an analysis pipeline specifically designed to detect protein-binding positions with high accuracy. Using previously reported data sets for three transcription factors, we illustrate methods for improving tag alignment and correcting for background signals. We compare the sensitivity and spatial precision of three peak detection algorithms with published methods, demonstrating gains in spatial precision when an asymmetric distribution of tags on positive and negative strands is considered. We also analyze the relationship between the depth of sequencing and characteristics of the detected binding positions, and provide a method for estimating the sequencing depth necessary for a desired coverage of protein binding sites.

Show MeSH
Related in: MedlinePlus