Limits...
In depth comparison of an individual's DNA and its lymphoblastoid cell line using whole genome sequencing.

Nickles D, Madireddy L, Yang S, Khankhanian P, Lincoln S, Hauser SL, Oksenberg JR, Baranzini SE - BMC Genomics (2012)

Bottom Line: Specifically, we sequenced the full genome (40X) of an individual using DNA purified from fresh whole blood as well as DNA from his LCL.We determined with high confidence that 99.2% of the genomes were identical, with no reproducible changes in structural variation (chromosomal rearrangements and copy number variations) or insertion/deletion polymorphisms (indels).Our results suggest that, at this level of resolution, the LCL is genetically indistinguishable from its genomic counterpart and therefore their use in clinical research is not likely to introduce a significant bias.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Neurology, University of California San Francisco, San Francisco, CA 94143-0435, USA.

ABSTRACT

Background: A detailed analysis of whole genomes can be now achieved with next generation sequencing. Epstein Barr Virus (EBV) transformation is a widely used strategy in clinical research to obtain an unlimited source of a subject's DNA. Although the mechanism of transformation and immortalization by EBV is relatively well known at the transcriptional and proteomic level, the genetic consequences of EBV transformation are less well understood. A detailed analysis of the genetic alterations introduced by EBV transformation is highly relevant, as it will inform on the usefulness and limitations of this approach.

Results: We used whole genome sequencing to assess the genomic signature of a low-passage lymphoblastoid cell line (LCL). Specifically, we sequenced the full genome (40X) of an individual using DNA purified from fresh whole blood as well as DNA from his LCL. A total of 217.33 Gb of sequence were generated from the cell line and 238.95 Gb from the normal genomic DNA. We determined with high confidence that 99.2% of the genomes were identical, with no reproducible changes in structural variation (chromosomal rearrangements and copy number variations) or insertion/deletion polymorphisms (indels).

Conclusions: Our results suggest that, at this level of resolution, the LCL is genetically indistinguishable from its genomic counterpart and therefore their use in clinical research is not likely to introduce a significant bias.

Show MeSH

Related in: MedlinePlus

Consequences of SomaticScore filtering in GT versus CT analysis. A: Number of variants identified in GT (black bars) and CT (grey bars) analyses passing a certain SomaticScore filter, respectively. More variants meeting stringent filtering criteria are identified in the CT analysis. In addition, the ratio of the number of variants in CT to GT analysis is provided inside the graph. B: Some characteristics of the variants identified in GT and CT analyses, respectively. Even though a higher number of variants is found in the GT analysis, a lower number passes the SomaticScore filter of 0.5, as compared to the CT analysis. Among these are only a few SNPs (which are the most reliable called variants), all of which have records in dbSNP. In contrast, all variants identified in the CT analysis, represent SNPs and all of them, but one, have not been reported before.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3473256&req=5

Figure 4: Consequences of SomaticScore filtering in GT versus CT analysis. A: Number of variants identified in GT (black bars) and CT (grey bars) analyses passing a certain SomaticScore filter, respectively. More variants meeting stringent filtering criteria are identified in the CT analysis. In addition, the ratio of the number of variants in CT to GT analysis is provided inside the graph. B: Some characteristics of the variants identified in GT and CT analyses, respectively. Even though a higher number of variants is found in the GT analysis, a lower number passes the SomaticScore filter of 0.5, as compared to the CT analysis. Among these are only a few SNPs (which are the most reliable called variants), all of which have records in dbSNP. In contrast, all variants identified in the CT analysis, represent SNPs and all of them, but one, have not been reported before.

Mentions: In order to better control the false positive rate, we used the option -SomaticOutput within calldiff, in which a SomaticScore is computed for every variant that permits adjusting for sensitivity and specificity (sensitivity = 1 - SomaticScore). The SomaticOutput analysis requires specification of one sample as “normal” and another as “tumor” and generates an output containing all loci that are non-reference in the “tumor” sample. Since the transformed cell line can be regarded as a tumor sample derived from the normal genomic sample, definitions were set accordingly (“Cell line - > Tumor (CT)” analysis). The reciprocal definitions were also analyzed as a control (“Genomic - > Tumor (GT)” analysis). Since most of the variants only found in the genomic DNA sample are expected to be the result of sequencing or calling errors, the GT analysis provides a reasonable estimate of the experimental noise. For both comparisons, the number of variant calls was assessed using different SomaticScore cut-off values. As shown in Figure 4A, increasing the SomaticScore cut-off increased the proportion of CT to GT variants, thus potentially maximizing true positive findings. Even though the total number of variant calls unique to the genomic DNA (retrieved by the GT analysis) is larger (Figure 4B), a larger number of variants was detected in the CT analysis at all tested cut-off values (Figure 4A). To minimize false positives, we chose a stringent cut-off of 0.5 [at this level, the number of differences in the CT analysis (417) almost doubles those found in the GT analysis (269)]. Assessing the regional distribution of these mutations revealed that variants unique to the cell line were randomly distributed throughout the genome; in contrast, a high proportion of variants unique to the genomic DNA seemed within or near telomeric or centromeric regions (Additional file 8). Interestingly, 52% of variants in the CT analysis represented SNPs, compared to only 6% identified in the GT analysis (Figure 4B). Since SNPs are more reliably called than other classes of variants, they are less likely to constitute noise. This could explain the low fraction of (confidently called) SNPs in the GT analysis, which is expected to mainly represent technical noise. We next compared the proportion of SNPs that were novel (not present in the dbSNP database [43]) in both analyses. Strikingly, whereas all SNPs that were only present in the genomic DNA have been reported before, none of the SNPs unique to the cell line were annotated in dbSNP, with the exception of one variant (Figure 4B). Although the low number of identified variants between LCL and genomic DNA is within technical noise their novelty suggests that, if real, most of these differences would be random mutational events, driven by the accelerated proliferation of transformed cells. When assessing whether these SNPs altered coding sequences, we found that while 40% of them overlapped with genes, none had an impact on mRNA sequences. Specifically, except for one SNP, all variants are either located in introns or untranslated regions, thus their consequences are not straightforward (Additional file 9). None of the 15 SNPs unique to the genomic DNA fell within a gene. In order to estimate the exact error rate of this technology, we randomly selected 60 SNPs unique to the cell line and re-analyzed these by Sanger sequencing. We could not confirm any of the identified variants (data not shown), suggesting that the real number of differences between the two genomes is even smaller than that implied by the SomaticOutput analysis.


In depth comparison of an individual's DNA and its lymphoblastoid cell line using whole genome sequencing.

Nickles D, Madireddy L, Yang S, Khankhanian P, Lincoln S, Hauser SL, Oksenberg JR, Baranzini SE - BMC Genomics (2012)

Consequences of SomaticScore filtering in GT versus CT analysis. A: Number of variants identified in GT (black bars) and CT (grey bars) analyses passing a certain SomaticScore filter, respectively. More variants meeting stringent filtering criteria are identified in the CT analysis. In addition, the ratio of the number of variants in CT to GT analysis is provided inside the graph. B: Some characteristics of the variants identified in GT and CT analyses, respectively. Even though a higher number of variants is found in the GT analysis, a lower number passes the SomaticScore filter of 0.5, as compared to the CT analysis. Among these are only a few SNPs (which are the most reliable called variants), all of which have records in dbSNP. In contrast, all variants identified in the CT analysis, represent SNPs and all of them, but one, have not been reported before.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3473256&req=5

Figure 4: Consequences of SomaticScore filtering in GT versus CT analysis. A: Number of variants identified in GT (black bars) and CT (grey bars) analyses passing a certain SomaticScore filter, respectively. More variants meeting stringent filtering criteria are identified in the CT analysis. In addition, the ratio of the number of variants in CT to GT analysis is provided inside the graph. B: Some characteristics of the variants identified in GT and CT analyses, respectively. Even though a higher number of variants is found in the GT analysis, a lower number passes the SomaticScore filter of 0.5, as compared to the CT analysis. Among these are only a few SNPs (which are the most reliable called variants), all of which have records in dbSNP. In contrast, all variants identified in the CT analysis, represent SNPs and all of them, but one, have not been reported before.
Mentions: In order to better control the false positive rate, we used the option -SomaticOutput within calldiff, in which a SomaticScore is computed for every variant that permits adjusting for sensitivity and specificity (sensitivity = 1 - SomaticScore). The SomaticOutput analysis requires specification of one sample as “normal” and another as “tumor” and generates an output containing all loci that are non-reference in the “tumor” sample. Since the transformed cell line can be regarded as a tumor sample derived from the normal genomic sample, definitions were set accordingly (“Cell line - > Tumor (CT)” analysis). The reciprocal definitions were also analyzed as a control (“Genomic - > Tumor (GT)” analysis). Since most of the variants only found in the genomic DNA sample are expected to be the result of sequencing or calling errors, the GT analysis provides a reasonable estimate of the experimental noise. For both comparisons, the number of variant calls was assessed using different SomaticScore cut-off values. As shown in Figure 4A, increasing the SomaticScore cut-off increased the proportion of CT to GT variants, thus potentially maximizing true positive findings. Even though the total number of variant calls unique to the genomic DNA (retrieved by the GT analysis) is larger (Figure 4B), a larger number of variants was detected in the CT analysis at all tested cut-off values (Figure 4A). To minimize false positives, we chose a stringent cut-off of 0.5 [at this level, the number of differences in the CT analysis (417) almost doubles those found in the GT analysis (269)]. Assessing the regional distribution of these mutations revealed that variants unique to the cell line were randomly distributed throughout the genome; in contrast, a high proportion of variants unique to the genomic DNA seemed within or near telomeric or centromeric regions (Additional file 8). Interestingly, 52% of variants in the CT analysis represented SNPs, compared to only 6% identified in the GT analysis (Figure 4B). Since SNPs are more reliably called than other classes of variants, they are less likely to constitute noise. This could explain the low fraction of (confidently called) SNPs in the GT analysis, which is expected to mainly represent technical noise. We next compared the proportion of SNPs that were novel (not present in the dbSNP database [43]) in both analyses. Strikingly, whereas all SNPs that were only present in the genomic DNA have been reported before, none of the SNPs unique to the cell line were annotated in dbSNP, with the exception of one variant (Figure 4B). Although the low number of identified variants between LCL and genomic DNA is within technical noise their novelty suggests that, if real, most of these differences would be random mutational events, driven by the accelerated proliferation of transformed cells. When assessing whether these SNPs altered coding sequences, we found that while 40% of them overlapped with genes, none had an impact on mRNA sequences. Specifically, except for one SNP, all variants are either located in introns or untranslated regions, thus their consequences are not straightforward (Additional file 9). None of the 15 SNPs unique to the genomic DNA fell within a gene. In order to estimate the exact error rate of this technology, we randomly selected 60 SNPs unique to the cell line and re-analyzed these by Sanger sequencing. We could not confirm any of the identified variants (data not shown), suggesting that the real number of differences between the two genomes is even smaller than that implied by the SomaticOutput analysis.

Bottom Line: Specifically, we sequenced the full genome (40X) of an individual using DNA purified from fresh whole blood as well as DNA from his LCL.We determined with high confidence that 99.2% of the genomes were identical, with no reproducible changes in structural variation (chromosomal rearrangements and copy number variations) or insertion/deletion polymorphisms (indels).Our results suggest that, at this level of resolution, the LCL is genetically indistinguishable from its genomic counterpart and therefore their use in clinical research is not likely to introduce a significant bias.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Neurology, University of California San Francisco, San Francisco, CA 94143-0435, USA.

ABSTRACT

Background: A detailed analysis of whole genomes can be now achieved with next generation sequencing. Epstein Barr Virus (EBV) transformation is a widely used strategy in clinical research to obtain an unlimited source of a subject's DNA. Although the mechanism of transformation and immortalization by EBV is relatively well known at the transcriptional and proteomic level, the genetic consequences of EBV transformation are less well understood. A detailed analysis of the genetic alterations introduced by EBV transformation is highly relevant, as it will inform on the usefulness and limitations of this approach.

Results: We used whole genome sequencing to assess the genomic signature of a low-passage lymphoblastoid cell line (LCL). Specifically, we sequenced the full genome (40X) of an individual using DNA purified from fresh whole blood as well as DNA from his LCL. A total of 217.33 Gb of sequence were generated from the cell line and 238.95 Gb from the normal genomic DNA. We determined with high confidence that 99.2% of the genomes were identical, with no reproducible changes in structural variation (chromosomal rearrangements and copy number variations) or insertion/deletion polymorphisms (indels).

Conclusions: Our results suggest that, at this level of resolution, the LCL is genetically indistinguishable from its genomic counterpart and therefore their use in clinical research is not likely to introduce a significant bias.

Show MeSH
Related in: MedlinePlus