Limits...
In depth comparison of an individual's DNA and its lymphoblastoid cell line using whole genome sequencing.

Nickles D, Madireddy L, Yang S, Khankhanian P, Lincoln S, Hauser SL, Oksenberg JR, Baranzini SE - BMC Genomics (2012)

Bottom Line: Specifically, we sequenced the full genome (40X) of an individual using DNA purified from fresh whole blood as well as DNA from his LCL.We determined with high confidence that 99.2% of the genomes were identical, with no reproducible changes in structural variation (chromosomal rearrangements and copy number variations) or insertion/deletion polymorphisms (indels).Our results suggest that, at this level of resolution, the LCL is genetically indistinguishable from its genomic counterpart and therefore their use in clinical research is not likely to introduce a significant bias.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Neurology, University of California San Francisco, San Francisco, CA 94143-0435, USA.

ABSTRACT

Background: A detailed analysis of whole genomes can be now achieved with next generation sequencing. Epstein Barr Virus (EBV) transformation is a widely used strategy in clinical research to obtain an unlimited source of a subject's DNA. Although the mechanism of transformation and immortalization by EBV is relatively well known at the transcriptional and proteomic level, the genetic consequences of EBV transformation are less well understood. A detailed analysis of the genetic alterations introduced by EBV transformation is highly relevant, as it will inform on the usefulness and limitations of this approach.

Results: We used whole genome sequencing to assess the genomic signature of a low-passage lymphoblastoid cell line (LCL). Specifically, we sequenced the full genome (40X) of an individual using DNA purified from fresh whole blood as well as DNA from his LCL. A total of 217.33 Gb of sequence were generated from the cell line and 238.95 Gb from the normal genomic DNA. We determined with high confidence that 99.2% of the genomes were identical, with no reproducible changes in structural variation (chromosomal rearrangements and copy number variations) or insertion/deletion polymorphisms (indels).

Conclusions: Our results suggest that, at this level of resolution, the LCL is genetically indistinguishable from its genomic counterpart and therefore their use in clinical research is not likely to introduce a significant bias.

Show MeSH

Related in: MedlinePlus

Most variant calls within genes are shared between genomic DNA and cell line. A: The Integrated Genomics Viewer (IGV) was used to assess variants in genes that were affected by non-synonymous mutations in both genomes, but where the number or the position of the variants differed. The following scenarios were encountered: (i) one of the calls in the two genomes was wrongly reported due to ambiguity in the alignments (left panel), (ii) coverage in one of the two genomes was very low at the called position, thus no variant call could be made (second left panel), (iii) a variant was determined homozygous in one of the genomes and heterozygous in the other, even though both genomes were homozygous (second right panel, “false positives”) or (iv) the two genomes looked really different (right panel, “true positive”). All reads were displayed in IGV. Each horizontal strip represents one read. Bases in agreement with the reference genome are displayed in grey, non-according bases are colored. Insertions are depicted by a purple square, deletions by a thick line, and gaps by a thin line. At the top of each panel, the relative coverage of each base in indicated by the height of the grey bar. The variant position is framed by two vertical lines. The genomic DNA is shown in the upper part of the panel, the cell line underneath it, at the bottom the reference sequence is displayed. B: For each class of non-synonymous variants (nonstop, nonsense, missense, frameshift), most mutations of this class in both genomic (blue) and cell line DNA (red) are shared between the two genomes (purple). A high percentage of the mutations called to be unique to either of the two genomes is not supported by actual reads.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3473256&req=5

Figure 3: Most variant calls within genes are shared between genomic DNA and cell line. A: The Integrated Genomics Viewer (IGV) was used to assess variants in genes that were affected by non-synonymous mutations in both genomes, but where the number or the position of the variants differed. The following scenarios were encountered: (i) one of the calls in the two genomes was wrongly reported due to ambiguity in the alignments (left panel), (ii) coverage in one of the two genomes was very low at the called position, thus no variant call could be made (second left panel), (iii) a variant was determined homozygous in one of the genomes and heterozygous in the other, even though both genomes were homozygous (second right panel, “false positives”) or (iv) the two genomes looked really different (right panel, “true positive”). All reads were displayed in IGV. Each horizontal strip represents one read. Bases in agreement with the reference genome are displayed in grey, non-according bases are colored. Insertions are depicted by a purple square, deletions by a thick line, and gaps by a thin line. At the top of each panel, the relative coverage of each base in indicated by the height of the grey bar. The variant position is framed by two vertical lines. The genomic DNA is shown in the upper part of the panel, the cell line underneath it, at the bottom the reference sequence is displayed. B: For each class of non-synonymous variants (nonstop, nonsense, missense, frameshift), most mutations of this class in both genomic (blue) and cell line DNA (red) are shared between the two genomes (purple). A high percentage of the mutations called to be unique to either of the two genomes is not supported by actual reads.

Mentions: We finally turned to an in-depth analysis of single nucleotide (SNP) and insertion/deletion (indel) polymorphisms. Using an automated whole-genome comparison algorithm (calldiff from cgatools) we found that 99.2% of the variant calls were identical between the two genomes (3,782,487 shared variants). Only 0.4% (15,364) and 0.3% (11,435) of variants were unique to the genomic and the cell line-derived DNA, respectively (Additional file 7, panel B). Of note, this level of discrepancy is within range of the error rate between technical replicates using CGI technology (SY, unpublished observations). Although the number of expected differences between the 2 genomes from the same individual was low a-priori, we continued searching for potentially functional differences, namely non-synonymous variants such as “missense” (amino acid changing mutation), “nonsense” (creating a stop codon where there was none before), “nonstop” (removing an existing stop codon) or “frameshift” (changing the reading frame of a gene). For each class of variant we identified the genes that were affected in each sample and then determined the overlap between the two genomes. While we found that 92% of the affected genes overlapped (5,995 genes total), these genes were not always affected by exactly the same mutations in both genomes. To test the reliability of these called differences, we inspected the sequence reads of 307 selected genes (exhibiting in total 647 non-synonymous variants) using the Integrative Genomics Viewer (IGV) [42]. We observed that in most instances local coverage in one of the two genomes was very low, thus no variation call could be made with high confidence at that position. In other cases, one of the calls in the two genomes was wrongly reported due to ambiguity in the alignments (i.e. the reads could have been aligned differently, so that the position of a variant differed between the genomes, even though the resulting non-reference sequence was the same). Another common discrepancy arose from the fact that a variant was determined to be homozygous in one of the genomes and heterozygous in the other, even though both genomes were homozygous (Figure 3A). Only 11% of the called differences between the two genomes were supported by visual inspection of actual reads (10 variants; Figure 3B), implying that a considerable fraction of the reported differences between genomic and cell line DNA represents false positives.


In depth comparison of an individual's DNA and its lymphoblastoid cell line using whole genome sequencing.

Nickles D, Madireddy L, Yang S, Khankhanian P, Lincoln S, Hauser SL, Oksenberg JR, Baranzini SE - BMC Genomics (2012)

Most variant calls within genes are shared between genomic DNA and cell line. A: The Integrated Genomics Viewer (IGV) was used to assess variants in genes that were affected by non-synonymous mutations in both genomes, but where the number or the position of the variants differed. The following scenarios were encountered: (i) one of the calls in the two genomes was wrongly reported due to ambiguity in the alignments (left panel), (ii) coverage in one of the two genomes was very low at the called position, thus no variant call could be made (second left panel), (iii) a variant was determined homozygous in one of the genomes and heterozygous in the other, even though both genomes were homozygous (second right panel, “false positives”) or (iv) the two genomes looked really different (right panel, “true positive”). All reads were displayed in IGV. Each horizontal strip represents one read. Bases in agreement with the reference genome are displayed in grey, non-according bases are colored. Insertions are depicted by a purple square, deletions by a thick line, and gaps by a thin line. At the top of each panel, the relative coverage of each base in indicated by the height of the grey bar. The variant position is framed by two vertical lines. The genomic DNA is shown in the upper part of the panel, the cell line underneath it, at the bottom the reference sequence is displayed. B: For each class of non-synonymous variants (nonstop, nonsense, missense, frameshift), most mutations of this class in both genomic (blue) and cell line DNA (red) are shared between the two genomes (purple). A high percentage of the mutations called to be unique to either of the two genomes is not supported by actual reads.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3473256&req=5

Figure 3: Most variant calls within genes are shared between genomic DNA and cell line. A: The Integrated Genomics Viewer (IGV) was used to assess variants in genes that were affected by non-synonymous mutations in both genomes, but where the number or the position of the variants differed. The following scenarios were encountered: (i) one of the calls in the two genomes was wrongly reported due to ambiguity in the alignments (left panel), (ii) coverage in one of the two genomes was very low at the called position, thus no variant call could be made (second left panel), (iii) a variant was determined homozygous in one of the genomes and heterozygous in the other, even though both genomes were homozygous (second right panel, “false positives”) or (iv) the two genomes looked really different (right panel, “true positive”). All reads were displayed in IGV. Each horizontal strip represents one read. Bases in agreement with the reference genome are displayed in grey, non-according bases are colored. Insertions are depicted by a purple square, deletions by a thick line, and gaps by a thin line. At the top of each panel, the relative coverage of each base in indicated by the height of the grey bar. The variant position is framed by two vertical lines. The genomic DNA is shown in the upper part of the panel, the cell line underneath it, at the bottom the reference sequence is displayed. B: For each class of non-synonymous variants (nonstop, nonsense, missense, frameshift), most mutations of this class in both genomic (blue) and cell line DNA (red) are shared between the two genomes (purple). A high percentage of the mutations called to be unique to either of the two genomes is not supported by actual reads.
Mentions: We finally turned to an in-depth analysis of single nucleotide (SNP) and insertion/deletion (indel) polymorphisms. Using an automated whole-genome comparison algorithm (calldiff from cgatools) we found that 99.2% of the variant calls were identical between the two genomes (3,782,487 shared variants). Only 0.4% (15,364) and 0.3% (11,435) of variants were unique to the genomic and the cell line-derived DNA, respectively (Additional file 7, panel B). Of note, this level of discrepancy is within range of the error rate between technical replicates using CGI technology (SY, unpublished observations). Although the number of expected differences between the 2 genomes from the same individual was low a-priori, we continued searching for potentially functional differences, namely non-synonymous variants such as “missense” (amino acid changing mutation), “nonsense” (creating a stop codon where there was none before), “nonstop” (removing an existing stop codon) or “frameshift” (changing the reading frame of a gene). For each class of variant we identified the genes that were affected in each sample and then determined the overlap between the two genomes. While we found that 92% of the affected genes overlapped (5,995 genes total), these genes were not always affected by exactly the same mutations in both genomes. To test the reliability of these called differences, we inspected the sequence reads of 307 selected genes (exhibiting in total 647 non-synonymous variants) using the Integrative Genomics Viewer (IGV) [42]. We observed that in most instances local coverage in one of the two genomes was very low, thus no variation call could be made with high confidence at that position. In other cases, one of the calls in the two genomes was wrongly reported due to ambiguity in the alignments (i.e. the reads could have been aligned differently, so that the position of a variant differed between the genomes, even though the resulting non-reference sequence was the same). Another common discrepancy arose from the fact that a variant was determined to be homozygous in one of the genomes and heterozygous in the other, even though both genomes were homozygous (Figure 3A). Only 11% of the called differences between the two genomes were supported by visual inspection of actual reads (10 variants; Figure 3B), implying that a considerable fraction of the reported differences between genomic and cell line DNA represents false positives.

Bottom Line: Specifically, we sequenced the full genome (40X) of an individual using DNA purified from fresh whole blood as well as DNA from his LCL.We determined with high confidence that 99.2% of the genomes were identical, with no reproducible changes in structural variation (chromosomal rearrangements and copy number variations) or insertion/deletion polymorphisms (indels).Our results suggest that, at this level of resolution, the LCL is genetically indistinguishable from its genomic counterpart and therefore their use in clinical research is not likely to introduce a significant bias.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Neurology, University of California San Francisco, San Francisco, CA 94143-0435, USA.

ABSTRACT

Background: A detailed analysis of whole genomes can be now achieved with next generation sequencing. Epstein Barr Virus (EBV) transformation is a widely used strategy in clinical research to obtain an unlimited source of a subject's DNA. Although the mechanism of transformation and immortalization by EBV is relatively well known at the transcriptional and proteomic level, the genetic consequences of EBV transformation are less well understood. A detailed analysis of the genetic alterations introduced by EBV transformation is highly relevant, as it will inform on the usefulness and limitations of this approach.

Results: We used whole genome sequencing to assess the genomic signature of a low-passage lymphoblastoid cell line (LCL). Specifically, we sequenced the full genome (40X) of an individual using DNA purified from fresh whole blood as well as DNA from his LCL. A total of 217.33 Gb of sequence were generated from the cell line and 238.95 Gb from the normal genomic DNA. We determined with high confidence that 99.2% of the genomes were identical, with no reproducible changes in structural variation (chromosomal rearrangements and copy number variations) or insertion/deletion polymorphisms (indels).

Conclusions: Our results suggest that, at this level of resolution, the LCL is genetically indistinguishable from its genomic counterpart and therefore their use in clinical research is not likely to introduce a significant bias.

Show MeSH
Related in: MedlinePlus