Limits...
Multiplexing of ChIP-Seq Samples in an Optimized Experimental Condition Has Minimal Impact on Peak Detection.

Kacmarczyk TJ, Bourque C, Zhang X, Jiang Y, Houvras Y, Alonso A, Betel D - PLoS ONE (2015)

Bottom Line: In most cases the number of samples that are multiplexed is determined by financial consideration or experimental convenience, with limited understanding on the effects on the experimental results.We found that, for histone marker H3K4me3, one can multiplex up to 8 samples (7 IP + 1 input) at ~21 million single-end reads each and still detect over 90% of all peaks found when using a full lane for sample (~181 million reads).Furthermore, there are no variations introduced by indexing or lane batch effects and importantly there is no significant reduction in the number of genes with neighboring H3K4me3 peaks.

View Article: PubMed Central - PubMed

Affiliation: Department of Medicine, Division of Hematology/Oncology, Epigenomics Core Facility, Weill Cornell Medical College, New York, New York, United States of America.

ABSTRACT
Multiplexing samples in sequencing experiments is a common approach to maximize information yield while minimizing cost. In most cases the number of samples that are multiplexed is determined by financial consideration or experimental convenience, with limited understanding on the effects on the experimental results. Here we set to examine the impact of multiplexing ChIP-seq experiments on the ability to identify a specific epigenetic modification. We performed peak detection analyses to determine the effects of multiplexing. These include false discovery rates, size, position and statistical significance of peak detection, and changes in gene annotation. We found that, for histone marker H3K4me3, one can multiplex up to 8 samples (7 IP + 1 input) at ~21 million single-end reads each and still detect over 90% of all peaks found when using a full lane for sample (~181 million reads). Furthermore, there are no variations introduced by indexing or lane batch effects and importantly there is no significant reduction in the number of genes with neighboring H3K4me3 peaks. We conclude that, for a well characterized antibody and, therefore, model IP condition, multiplexing 8 samples per lane is sufficient to capture most of the biological signal.

No MeSH data available.


Related in: MedlinePlus

Peak detection and false discovery rate.As expected the number of peaks and their width are reduced as coverage is reduced. A) The mean number of peaks identified for each sample by multiplex level. B) The mean peak width of identified peaks for each sample by multiplex level. C) The false discovery rate (FDR) for each sample was computed by contrasting the input to the IP samples: ~181 million (M) reads (blue), ~43M reads (green), ~43M reads chip-2x input (red), ~31M reads (orange), ~21M reads (salmon). FDR for some experiments related to Input-4 are close to >0.45% suggesting these rates are an artifact of the library. Overall, experiments either 1-plex or multiplexed, with equal fractions of IP and input or double input, the FDR is < = 0.43%.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4466019&req=5

pone.0129350.g003: Peak detection and false discovery rate.As expected the number of peaks and their width are reduced as coverage is reduced. A) The mean number of peaks identified for each sample by multiplex level. B) The mean peak width of identified peaks for each sample by multiplex level. C) The false discovery rate (FDR) for each sample was computed by contrasting the input to the IP samples: ~181 million (M) reads (blue), ~43M reads (green), ~43M reads chip-2x input (red), ~31M reads (orange), ~21M reads (salmon). FDR for some experiments related to Input-4 are close to >0.45% suggesting these rates are an artifact of the library. Overall, experiments either 1-plex or multiplexed, with equal fractions of IP and input or double input, the FDR is < = 0.43%.

Mentions: Sequencing depth for peak detection was within the range for sufficient ChIP signal strength (>20M mapped reads) defined in the guidelines by The ENCODE and modENCODE consortia [3]. Peaks were identified using default parameters of ChIPseeqer [4]. Using the peaks detected from the full IP (the reference set; 1-plex; ~181M reads) and input lanes, we compared the peaks detected from the multiplexed ChIPs (4-plex ~43M reads, 6-plex ~31M reads, and 8-plex ~21M reads). To determine the effect of reduced sequence coverage on peak discovery we counted the total number of peaks identified for each multiplexing level by the ChIPseeqer peak detection algorithm. For our initial approach, we averaged the number of peaks for each sample of each multiplexing level. As expected, reduced coverage as result of increasing the multiplexing factor results in fewer detected peaks. Compared to 1-plex, when using ~43M reads there were 1.9% fewer peaks, and when using ~21M reads there were 7.3% fewer peaks (Fig 3A), as well as reduction in the average peak width, 23.3% shorter width at ~43M reads and 38.6% shorter width at ~21M reads (Fig 3B). Since peak-calling algorithms can have very different models we repeated this analysis using MACS2 (an updated version of MACS; https://github.com/taoliu/MACS) [5]. When running MACS2 with parameter values estimated to be analogous to ChIPseeqer, MACS2 tended to call a greater number of peaks with shorter widths with the same trend of reduction in number or width as multiplexing increases (S1 and S2 Figs).


Multiplexing of ChIP-Seq Samples in an Optimized Experimental Condition Has Minimal Impact on Peak Detection.

Kacmarczyk TJ, Bourque C, Zhang X, Jiang Y, Houvras Y, Alonso A, Betel D - PLoS ONE (2015)

Peak detection and false discovery rate.As expected the number of peaks and their width are reduced as coverage is reduced. A) The mean number of peaks identified for each sample by multiplex level. B) The mean peak width of identified peaks for each sample by multiplex level. C) The false discovery rate (FDR) for each sample was computed by contrasting the input to the IP samples: ~181 million (M) reads (blue), ~43M reads (green), ~43M reads chip-2x input (red), ~31M reads (orange), ~21M reads (salmon). FDR for some experiments related to Input-4 are close to >0.45% suggesting these rates are an artifact of the library. Overall, experiments either 1-plex or multiplexed, with equal fractions of IP and input or double input, the FDR is < = 0.43%.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4466019&req=5

pone.0129350.g003: Peak detection and false discovery rate.As expected the number of peaks and their width are reduced as coverage is reduced. A) The mean number of peaks identified for each sample by multiplex level. B) The mean peak width of identified peaks for each sample by multiplex level. C) The false discovery rate (FDR) for each sample was computed by contrasting the input to the IP samples: ~181 million (M) reads (blue), ~43M reads (green), ~43M reads chip-2x input (red), ~31M reads (orange), ~21M reads (salmon). FDR for some experiments related to Input-4 are close to >0.45% suggesting these rates are an artifact of the library. Overall, experiments either 1-plex or multiplexed, with equal fractions of IP and input or double input, the FDR is < = 0.43%.
Mentions: Sequencing depth for peak detection was within the range for sufficient ChIP signal strength (>20M mapped reads) defined in the guidelines by The ENCODE and modENCODE consortia [3]. Peaks were identified using default parameters of ChIPseeqer [4]. Using the peaks detected from the full IP (the reference set; 1-plex; ~181M reads) and input lanes, we compared the peaks detected from the multiplexed ChIPs (4-plex ~43M reads, 6-plex ~31M reads, and 8-plex ~21M reads). To determine the effect of reduced sequence coverage on peak discovery we counted the total number of peaks identified for each multiplexing level by the ChIPseeqer peak detection algorithm. For our initial approach, we averaged the number of peaks for each sample of each multiplexing level. As expected, reduced coverage as result of increasing the multiplexing factor results in fewer detected peaks. Compared to 1-plex, when using ~43M reads there were 1.9% fewer peaks, and when using ~21M reads there were 7.3% fewer peaks (Fig 3A), as well as reduction in the average peak width, 23.3% shorter width at ~43M reads and 38.6% shorter width at ~21M reads (Fig 3B). Since peak-calling algorithms can have very different models we repeated this analysis using MACS2 (an updated version of MACS; https://github.com/taoliu/MACS) [5]. When running MACS2 with parameter values estimated to be analogous to ChIPseeqer, MACS2 tended to call a greater number of peaks with shorter widths with the same trend of reduction in number or width as multiplexing increases (S1 and S2 Figs).

Bottom Line: In most cases the number of samples that are multiplexed is determined by financial consideration or experimental convenience, with limited understanding on the effects on the experimental results.We found that, for histone marker H3K4me3, one can multiplex up to 8 samples (7 IP + 1 input) at ~21 million single-end reads each and still detect over 90% of all peaks found when using a full lane for sample (~181 million reads).Furthermore, there are no variations introduced by indexing or lane batch effects and importantly there is no significant reduction in the number of genes with neighboring H3K4me3 peaks.

View Article: PubMed Central - PubMed

Affiliation: Department of Medicine, Division of Hematology/Oncology, Epigenomics Core Facility, Weill Cornell Medical College, New York, New York, United States of America.

ABSTRACT
Multiplexing samples in sequencing experiments is a common approach to maximize information yield while minimizing cost. In most cases the number of samples that are multiplexed is determined by financial consideration or experimental convenience, with limited understanding on the effects on the experimental results. Here we set to examine the impact of multiplexing ChIP-seq experiments on the ability to identify a specific epigenetic modification. We performed peak detection analyses to determine the effects of multiplexing. These include false discovery rates, size, position and statistical significance of peak detection, and changes in gene annotation. We found that, for histone marker H3K4me3, one can multiplex up to 8 samples (7 IP + 1 input) at ~21 million single-end reads each and still detect over 90% of all peaks found when using a full lane for sample (~181 million reads). Furthermore, there are no variations introduced by indexing or lane batch effects and importantly there is no significant reduction in the number of genes with neighboring H3K4me3 peaks. We conclude that, for a well characterized antibody and, therefore, model IP condition, multiplexing 8 samples per lane is sufficient to capture most of the biological signal.

No MeSH data available.


Related in: MedlinePlus