Limits...
Inherent signals in sequencing-based Chromatin-ImmunoPrecipitation control libraries.

Vega VB, Cheung E, Palanisamy N, Sung WK - PLoS ONE (2009)

Bottom Line: The growth of sequencing-based Chromatin Immuno-Precipitation studies call for a more in-depth understanding of the nature of the technology and of the resultant data to reduce false positives and false negatives.We found that copy number plays a major influence in both ChIP-enriched as well as control libraries.Next, significantly tag-rich 5 kbp regions were identified and they were associated with various genomic landmarks.

View Article: PubMed Central - PubMed

Affiliation: Computational and Mathematical Biology Group, Genome Institute of Singapore, Singapore, Singapore.

ABSTRACT

Background: The growth of sequencing-based Chromatin Immuno-Precipitation studies call for a more in-depth understanding of the nature of the technology and of the resultant data to reduce false positives and false negatives. Control libraries are typically constructed to complement such studies in order to mitigate the effect of systematic biases that might be present in the data. In this study, we explored multiple control libraries to obtain better understanding of what they truly represent.

Methodology: First, we analyzed the genome-wide profiles of various sequencing-based libraries at a low resolution of 1 Mbp, and compared them with each other as well as against aCGH data. We found that copy number plays a major influence in both ChIP-enriched as well as control libraries. Following that, we inspected the repeat regions to assess the extent of mapping bias. Next, significantly tag-rich 5 kbp regions were identified and they were associated with various genomic landmarks. For instance, we discovered that gene boundaries were surprisingly enriched with sequenced tags. Further, profiles between different cell types were noticeably distinct although the cell types were somewhat related and similar.

Conclusions: We found that control libraries bear traces of systematic biases. The biases can be attributed to genomic copy number, inherent sequencing bias, plausible mapping ambiguity, and cell-type specific chromatin structure. Our results suggest careful analysis of control libraries can reveal promising biological insights.

Show MeSH

Related in: MedlinePlus

Whole cell extract sequencing (WCEseq) libraries are biased by genomic copy.The genome-wide copy number of MCF-7 (obtained from array CGH) at 1 Mbp resolution is contrasted to estimations made from (a) a WCEseq library and (b) ER ChIP-enriched library, sorted in chromosomal order. The high correlation (Pearson's r = 0.875) between WCEseq estimate and actual aCGH readout indicates coarse-scale profile of WCEseq library is dominantly shaped by copy number variations. Inherent effect of copy number variations also strongly affect ChIP-enriched library (Pearson's r = 0.673).
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2666154&req=5

pone-0005241-g001: Whole cell extract sequencing (WCEseq) libraries are biased by genomic copy.The genome-wide copy number of MCF-7 (obtained from array CGH) at 1 Mbp resolution is contrasted to estimations made from (a) a WCEseq library and (b) ER ChIP-enriched library, sorted in chromosomal order. The high correlation (Pearson's r = 0.875) between WCEseq estimate and actual aCGH readout indicates coarse-scale profile of WCEseq library is dominantly shaped by copy number variations. Inherent effect of copy number variations also strongly affect ChIP-enriched library (Pearson's r = 0.673).

Mentions: To investigate how much genomic copy number influence the control library, an in-house array CGH data (unpublished data – N.P.) of the MCF-7 cells was used as the benchmark for copy number variations in MCF-7. A whole cell extract library was also generated from MCF-7 and followed by direct ultra high-throughput sequencing using Solexa Genome Analyzer platform. Using Equation (1) and 1 Mbp sliding window (see Materials and Methods), we estimated the genome-wide copy number of MCF-7 based on the whole cell extract (WCEseq) library. As a comparison, we also took ChIP-enriched library (ER ChIP-PET [7]) and similarly estimated the genome-wide copy number using a signal-filtering approach and Equation (2). The copy number estimated from WCEseq library matched the array CGH readout very well (Pearson's r = 0.875, Fig. 1a). Interestingly, the estimate from ER ChIP enriched library agreed with the aCGH reasonably well too (Pearson's r = 0.673, see Fig. 1b).


Inherent signals in sequencing-based Chromatin-ImmunoPrecipitation control libraries.

Vega VB, Cheung E, Palanisamy N, Sung WK - PLoS ONE (2009)

Whole cell extract sequencing (WCEseq) libraries are biased by genomic copy.The genome-wide copy number of MCF-7 (obtained from array CGH) at 1 Mbp resolution is contrasted to estimations made from (a) a WCEseq library and (b) ER ChIP-enriched library, sorted in chromosomal order. The high correlation (Pearson's r = 0.875) between WCEseq estimate and actual aCGH readout indicates coarse-scale profile of WCEseq library is dominantly shaped by copy number variations. Inherent effect of copy number variations also strongly affect ChIP-enriched library (Pearson's r = 0.673).
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2666154&req=5

pone-0005241-g001: Whole cell extract sequencing (WCEseq) libraries are biased by genomic copy.The genome-wide copy number of MCF-7 (obtained from array CGH) at 1 Mbp resolution is contrasted to estimations made from (a) a WCEseq library and (b) ER ChIP-enriched library, sorted in chromosomal order. The high correlation (Pearson's r = 0.875) between WCEseq estimate and actual aCGH readout indicates coarse-scale profile of WCEseq library is dominantly shaped by copy number variations. Inherent effect of copy number variations also strongly affect ChIP-enriched library (Pearson's r = 0.673).
Mentions: To investigate how much genomic copy number influence the control library, an in-house array CGH data (unpublished data – N.P.) of the MCF-7 cells was used as the benchmark for copy number variations in MCF-7. A whole cell extract library was also generated from MCF-7 and followed by direct ultra high-throughput sequencing using Solexa Genome Analyzer platform. Using Equation (1) and 1 Mbp sliding window (see Materials and Methods), we estimated the genome-wide copy number of MCF-7 based on the whole cell extract (WCEseq) library. As a comparison, we also took ChIP-enriched library (ER ChIP-PET [7]) and similarly estimated the genome-wide copy number using a signal-filtering approach and Equation (2). The copy number estimated from WCEseq library matched the array CGH readout very well (Pearson's r = 0.875, Fig. 1a). Interestingly, the estimate from ER ChIP enriched library agreed with the aCGH reasonably well too (Pearson's r = 0.673, see Fig. 1b).

Bottom Line: The growth of sequencing-based Chromatin Immuno-Precipitation studies call for a more in-depth understanding of the nature of the technology and of the resultant data to reduce false positives and false negatives.We found that copy number plays a major influence in both ChIP-enriched as well as control libraries.Next, significantly tag-rich 5 kbp regions were identified and they were associated with various genomic landmarks.

View Article: PubMed Central - PubMed

Affiliation: Computational and Mathematical Biology Group, Genome Institute of Singapore, Singapore, Singapore.

ABSTRACT

Background: The growth of sequencing-based Chromatin Immuno-Precipitation studies call for a more in-depth understanding of the nature of the technology and of the resultant data to reduce false positives and false negatives. Control libraries are typically constructed to complement such studies in order to mitigate the effect of systematic biases that might be present in the data. In this study, we explored multiple control libraries to obtain better understanding of what they truly represent.

Methodology: First, we analyzed the genome-wide profiles of various sequencing-based libraries at a low resolution of 1 Mbp, and compared them with each other as well as against aCGH data. We found that copy number plays a major influence in both ChIP-enriched as well as control libraries. Following that, we inspected the repeat regions to assess the extent of mapping bias. Next, significantly tag-rich 5 kbp regions were identified and they were associated with various genomic landmarks. For instance, we discovered that gene boundaries were surprisingly enriched with sequenced tags. Further, profiles between different cell types were noticeably distinct although the cell types were somewhat related and similar.

Conclusions: We found that control libraries bear traces of systematic biases. The biases can be attributed to genomic copy number, inherent sequencing bias, plausible mapping ambiguity, and cell-type specific chromatin structure. Our results suggest careful analysis of control libraries can reveal promising biological insights.

Show MeSH
Related in: MedlinePlus