Limits...
Integrated genome and transcriptome sequencing of the same cell.

Dey SS, Kester L, Spanjaard B, Bienko M, van Oudenaarden A - Nat. Biotechnol. (2015)

Bottom Line: We describe a quasilinear amplification strategy to quantify genomic DNA and mRNA from the same cell without physically separating the nucleic acids before amplification.We show that the efficiency of our integrated approach is similar to existing methods for single-cell sequencing of either genomic DNA or mRNA.Further, we find that genes with high cell-to-cell variability in transcript numbers generally have lower genomic copy numbers, and vice versa, suggesting that copy number variations may drive variability in gene expression among individual cells.

View Article: PubMed Central - PubMed

Affiliation: 1] Hubrecht Institute-KNAW (Royal Netherlands Academy of Arts and Sciences), Utrecht, the Netherlands. [2] University Medical Center Utrecht, Cancer Genomics Netherlands, Utrecht, the Netherlands.

ABSTRACT
Single-cell genomics and single-cell transcriptomics have emerged as powerful tools to study the biology of single cells at a genome-wide scale. However, a major challenge is to sequence both genomic DNA and mRNA from the same cell, which would allow direct comparison of genomic variation and transcriptome heterogeneity. We describe a quasilinear amplification strategy to quantify genomic DNA and mRNA from the same cell without physically separating the nucleic acids before amplification. We show that the efficiency of our integrated approach is similar to existing methods for single-cell sequencing of either genomic DNA or mRNA. Further, we find that genes with high cell-to-cell variability in transcript numbers generally have lower genomic copy numbers, and vice versa, suggesting that copy number variations may drive variability in gene expression among individual cells. Applications of our integrated sequencing approach could range from gaining insights into cancer evolution and heterogeneity to understanding the transcriptional consequences of copy number variations in healthy and diseased tissues.

Show MeSH

Related in: MedlinePlus

Applying DR-Seq to the SK-BR-3 cell line to understand how copy number variations affect gene expression in single cells. (a) Top panel shows raw gDNA data (dots) and different copy numbers (red line) identified using the CBS algorithm26 for Chr 8 in bulk sequencing data. The middle panel shows raw data (dots) and median read counts (red line) identified using CBS for one single cell (SC13). Visual comparison of the top and middle panels show that most breakpoints are reliably detected in single cells and patterns of level changes between bulk and single cell gDNA sequencing are well correlated. The median read depths for each segment in single cells and the bulk copy numbers are used to estimate copy number variations in single cells (Supplementary Note). For each median level identified from the single cell gDNA data (middle panel), mean expression of genes within each level was calculated (black lines in lower panel). The lower panel shows that the mean expression of genes within each segment correlates well with the median gDNA levels. (b) Genome-wide quantification of mean expression of genes within different copy number regions shows a monotonic increase in average expression with increase in copy number for 3 single cells (also see Supplementary Fig. 25). (c) For a large range of mean expressions (5-400 RPM), genes exhibiting the highest and lowest noise (quantified as coefficient of variation, or CV) were identified. The x-axis shows the percentage of most noisy and least noisy genes that were considered in the analysis. The data shows that the noisiest genes are associated with low copy number regions and vice versa (also see Supplementary Fig. 27). Error bars represent standard error in estimating the mean obtained by bootstrapping the data.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4374170&req=5

Figure 3: Applying DR-Seq to the SK-BR-3 cell line to understand how copy number variations affect gene expression in single cells. (a) Top panel shows raw gDNA data (dots) and different copy numbers (red line) identified using the CBS algorithm26 for Chr 8 in bulk sequencing data. The middle panel shows raw data (dots) and median read counts (red line) identified using CBS for one single cell (SC13). Visual comparison of the top and middle panels show that most breakpoints are reliably detected in single cells and patterns of level changes between bulk and single cell gDNA sequencing are well correlated. The median read depths for each segment in single cells and the bulk copy numbers are used to estimate copy number variations in single cells (Supplementary Note). For each median level identified from the single cell gDNA data (middle panel), mean expression of genes within each level was calculated (black lines in lower panel). The lower panel shows that the mean expression of genes within each segment correlates well with the median gDNA levels. (b) Genome-wide quantification of mean expression of genes within different copy number regions shows a monotonic increase in average expression with increase in copy number for 3 single cells (also see Supplementary Fig. 25). (c) For a large range of mean expressions (5-400 RPM), genes exhibiting the highest and lowest noise (quantified as coefficient of variation, or CV) were identified. The x-axis shows the percentage of most noisy and least noisy genes that were considered in the analysis. The data shows that the noisiest genes are associated with low copy number regions and vice versa (also see Supplementary Fig. 27). Error bars represent standard error in estimating the mean obtained by bootstrapping the data.

Mentions: After correcting for GC bias, the circular binary segmentation (CBS) algorithm was used to detect breakpoints26. Figures 3a shows that raw data and breakpoint detection for Chr 8 from one cell correlated well with copy number changes detected in bulk sequencing. Similarly, breakpoint detection over the entire genome for all the single cells correlated well with the bulk sequencing results (Supplementary Fig. 21). The median read counts for each of the segments were used to estimate copy numbers in single cells (Supplementary Figs. 18a,21, Supplementary Table 2 and Supplementary Note). We also developed a model to estimate confidence intervals for the copy numbers that are called by our algorithm (Supplementary Fig. 22 and Supplementary Note). Further, the mean copy numbers over all the single cells correlated well with the bulk sequencing copy numbers over the entire genome (Supplementary Figs. 17 and 18b,c). Finally, we also detected significant cell-to-cell variability in copy numbers over certain regions of the genome (Supplementary Figs. 17 and 23). We performed DNA Fluorescence In Situ Hybridization (FISH) over 4 genomic loci that span a large spectrum of copy numbers and found that the mean copy numbers detected by DR-Seq and DNA FISH were in good agreement (Supplementary Fig. 24)27. Notably, we also found that the distribution of copy numbers for these 4 loci in single cells amplified by DR-Seq were not statistically different from distributions obtained by DNA FISH (p > 0.01 and Supplementary Table 4). These results showed that DR-Seq has the sensitivity to capture heterogeneity in copy numbers across single cells.


Integrated genome and transcriptome sequencing of the same cell.

Dey SS, Kester L, Spanjaard B, Bienko M, van Oudenaarden A - Nat. Biotechnol. (2015)

Applying DR-Seq to the SK-BR-3 cell line to understand how copy number variations affect gene expression in single cells. (a) Top panel shows raw gDNA data (dots) and different copy numbers (red line) identified using the CBS algorithm26 for Chr 8 in bulk sequencing data. The middle panel shows raw data (dots) and median read counts (red line) identified using CBS for one single cell (SC13). Visual comparison of the top and middle panels show that most breakpoints are reliably detected in single cells and patterns of level changes between bulk and single cell gDNA sequencing are well correlated. The median read depths for each segment in single cells and the bulk copy numbers are used to estimate copy number variations in single cells (Supplementary Note). For each median level identified from the single cell gDNA data (middle panel), mean expression of genes within each level was calculated (black lines in lower panel). The lower panel shows that the mean expression of genes within each segment correlates well with the median gDNA levels. (b) Genome-wide quantification of mean expression of genes within different copy number regions shows a monotonic increase in average expression with increase in copy number for 3 single cells (also see Supplementary Fig. 25). (c) For a large range of mean expressions (5-400 RPM), genes exhibiting the highest and lowest noise (quantified as coefficient of variation, or CV) were identified. The x-axis shows the percentage of most noisy and least noisy genes that were considered in the analysis. The data shows that the noisiest genes are associated with low copy number regions and vice versa (also see Supplementary Fig. 27). Error bars represent standard error in estimating the mean obtained by bootstrapping the data.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4374170&req=5

Figure 3: Applying DR-Seq to the SK-BR-3 cell line to understand how copy number variations affect gene expression in single cells. (a) Top panel shows raw gDNA data (dots) and different copy numbers (red line) identified using the CBS algorithm26 for Chr 8 in bulk sequencing data. The middle panel shows raw data (dots) and median read counts (red line) identified using CBS for one single cell (SC13). Visual comparison of the top and middle panels show that most breakpoints are reliably detected in single cells and patterns of level changes between bulk and single cell gDNA sequencing are well correlated. The median read depths for each segment in single cells and the bulk copy numbers are used to estimate copy number variations in single cells (Supplementary Note). For each median level identified from the single cell gDNA data (middle panel), mean expression of genes within each level was calculated (black lines in lower panel). The lower panel shows that the mean expression of genes within each segment correlates well with the median gDNA levels. (b) Genome-wide quantification of mean expression of genes within different copy number regions shows a monotonic increase in average expression with increase in copy number for 3 single cells (also see Supplementary Fig. 25). (c) For a large range of mean expressions (5-400 RPM), genes exhibiting the highest and lowest noise (quantified as coefficient of variation, or CV) were identified. The x-axis shows the percentage of most noisy and least noisy genes that were considered in the analysis. The data shows that the noisiest genes are associated with low copy number regions and vice versa (also see Supplementary Fig. 27). Error bars represent standard error in estimating the mean obtained by bootstrapping the data.
Mentions: After correcting for GC bias, the circular binary segmentation (CBS) algorithm was used to detect breakpoints26. Figures 3a shows that raw data and breakpoint detection for Chr 8 from one cell correlated well with copy number changes detected in bulk sequencing. Similarly, breakpoint detection over the entire genome for all the single cells correlated well with the bulk sequencing results (Supplementary Fig. 21). The median read counts for each of the segments were used to estimate copy numbers in single cells (Supplementary Figs. 18a,21, Supplementary Table 2 and Supplementary Note). We also developed a model to estimate confidence intervals for the copy numbers that are called by our algorithm (Supplementary Fig. 22 and Supplementary Note). Further, the mean copy numbers over all the single cells correlated well with the bulk sequencing copy numbers over the entire genome (Supplementary Figs. 17 and 18b,c). Finally, we also detected significant cell-to-cell variability in copy numbers over certain regions of the genome (Supplementary Figs. 17 and 23). We performed DNA Fluorescence In Situ Hybridization (FISH) over 4 genomic loci that span a large spectrum of copy numbers and found that the mean copy numbers detected by DR-Seq and DNA FISH were in good agreement (Supplementary Fig. 24)27. Notably, we also found that the distribution of copy numbers for these 4 loci in single cells amplified by DR-Seq were not statistically different from distributions obtained by DNA FISH (p > 0.01 and Supplementary Table 4). These results showed that DR-Seq has the sensitivity to capture heterogeneity in copy numbers across single cells.

Bottom Line: We describe a quasilinear amplification strategy to quantify genomic DNA and mRNA from the same cell without physically separating the nucleic acids before amplification.We show that the efficiency of our integrated approach is similar to existing methods for single-cell sequencing of either genomic DNA or mRNA.Further, we find that genes with high cell-to-cell variability in transcript numbers generally have lower genomic copy numbers, and vice versa, suggesting that copy number variations may drive variability in gene expression among individual cells.

View Article: PubMed Central - PubMed

Affiliation: 1] Hubrecht Institute-KNAW (Royal Netherlands Academy of Arts and Sciences), Utrecht, the Netherlands. [2] University Medical Center Utrecht, Cancer Genomics Netherlands, Utrecht, the Netherlands.

ABSTRACT
Single-cell genomics and single-cell transcriptomics have emerged as powerful tools to study the biology of single cells at a genome-wide scale. However, a major challenge is to sequence both genomic DNA and mRNA from the same cell, which would allow direct comparison of genomic variation and transcriptome heterogeneity. We describe a quasilinear amplification strategy to quantify genomic DNA and mRNA from the same cell without physically separating the nucleic acids before amplification. We show that the efficiency of our integrated approach is similar to existing methods for single-cell sequencing of either genomic DNA or mRNA. Further, we find that genes with high cell-to-cell variability in transcript numbers generally have lower genomic copy numbers, and vice versa, suggesting that copy number variations may drive variability in gene expression among individual cells. Applications of our integrated sequencing approach could range from gaining insights into cancer evolution and heterogeneity to understanding the transcriptional consequences of copy number variations in healthy and diseased tissues.

Show MeSH
Related in: MedlinePlus