Limits...
Read count-based method for high-throughput allelic genotyping of transposable elements and structural variants.

Kuhn A, Ong YM, Quake SR, Burkholder WF - BMC Genomics (2015)

Bottom Line: Like other structural variants, transposable element insertions can be highly polymorphic across individuals.Their functional impact, however, remains poorly understood.This method can benefit a wide range of applications from the routine genotyping of animal and plant populations to the functional study of structural variants in humans.

View Article: PubMed Central - PubMed

Affiliation: Microfluidics Systems Biology Lab, Institute of Molecular and Cell Biology, Agency for Science, Technology and Research (A*STAR), Proteos Building, Room #03-04, 61 Biopolis Drive, Singapore, 138673, Singapore. alexandre.m.kuhn@gmail.com.

ABSTRACT

Background: Like other structural variants, transposable element insertions can be highly polymorphic across individuals. Their functional impact, however, remains poorly understood. Current genome-wide approaches for genotyping insertion-site polymorphisms based on targeted or whole-genome sequencing remain very expensive and can lack accuracy, hence new large-scale genotyping methods are needed.

Results: We describe a high-throughput method for genotyping transposable element insertions and other types of structural variants that can be assayed by breakpoint PCR. The method relies on next-generation sequencing of multiplex, site-specific PCR amplification products and read count-based genotype calls. We show that this method is flexible, efficient (it does not require rounds of optimization), cost-effective and highly accurate.

Conclusions: This method can benefit a wide range of applications from the routine genotyping of animal and plant populations to the functional study of structural variants in humans.

No MeSH data available.


Related in: MedlinePlus

Read count cluster statistics and genotype quality scores for the 60-loci libraries. a: Mean versus standard deviation of clusters obtained with the E libraries. Black and blue circles indicate, respectively, low and high read count clusters. Despite locus-to-locus variations, most clusters had similar means and standard deviations. We manually set thresholds (represented as gray lines) at 3 (mean) and 0.5 (standard deviation), which dropped out locus 23 (low read count cluster had mean greater than 3), loci 50 and 57 (high read count cluster had mean less than 3) and locus 30 (standard deviation greater than 0.5). b: Same as a for the G libraries. We dropped locus 34 (high read count cluster had mean less than 3) and loci 3, 43 and 44 (standard deviation greater than 0.5). c: Histograms of genotype quality scores obtained for the E libraries. Scores below 7 (threshold indicated by a gray vertical line) are indicated as crosses in Fig. 3a, c. d: Same as c for the G libraries
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4494700&req=5

Fig4: Read count cluster statistics and genotype quality scores for the 60-loci libraries. a: Mean versus standard deviation of clusters obtained with the E libraries. Black and blue circles indicate, respectively, low and high read count clusters. Despite locus-to-locus variations, most clusters had similar means and standard deviations. We manually set thresholds (represented as gray lines) at 3 (mean) and 0.5 (standard deviation), which dropped out locus 23 (low read count cluster had mean greater than 3), loci 50 and 57 (high read count cluster had mean less than 3) and locus 30 (standard deviation greater than 0.5). b: Same as a for the G libraries. We dropped locus 34 (high read count cluster had mean less than 3) and loci 3, 43 and 44 (standard deviation greater than 0.5). c: Histograms of genotype quality scores obtained for the E libraries. Scores below 7 (threshold indicated by a gray vertical line) are indicated as crosses in Fig. 3a, c. d: Same as c for the G libraries

Mentions: We used the position and spread of the low and high read count clusters to automatically spot loci with potential genotyping problems (Fig. 4a-b). Unusual cluster mean (excessively low mean for a high read count cluster or excessively high mean for a low read count cluster) signaled loci that did not amplify convincingly (e.g. locus 50 in Fig. 3a) or that failed clustering (e.g. locus 23 in Fig. 3a). We dropped 6 and 7 loci from the E and G libraries, respectively: For the E libraries, 4 loci showed poor clustering (Fig. 4a) and 1 locus had reads that did not map uniquely to the targeted site. For the G libraries, 4 loci showed poor clustering (Fig. 4b) and 2 loci had reads that did not map uniquely. One primer pair was found a posteriori not to work properly and was excluded from both libraries. In addition to characterizing each locus using statistical characteristics of the clusters, we also derived genotyping quality scores representing the confidence of each call given the observed read count and the underlying clusters (Fig. 4c-d). Loci where many samples showed low quality scores overlapped with loci dropped based on poor cluster characteristics (Fig. 3a, c).Fig. 4


Read count-based method for high-throughput allelic genotyping of transposable elements and structural variants.

Kuhn A, Ong YM, Quake SR, Burkholder WF - BMC Genomics (2015)

Read count cluster statistics and genotype quality scores for the 60-loci libraries. a: Mean versus standard deviation of clusters obtained with the E libraries. Black and blue circles indicate, respectively, low and high read count clusters. Despite locus-to-locus variations, most clusters had similar means and standard deviations. We manually set thresholds (represented as gray lines) at 3 (mean) and 0.5 (standard deviation), which dropped out locus 23 (low read count cluster had mean greater than 3), loci 50 and 57 (high read count cluster had mean less than 3) and locus 30 (standard deviation greater than 0.5). b: Same as a for the G libraries. We dropped locus 34 (high read count cluster had mean less than 3) and loci 3, 43 and 44 (standard deviation greater than 0.5). c: Histograms of genotype quality scores obtained for the E libraries. Scores below 7 (threshold indicated by a gray vertical line) are indicated as crosses in Fig. 3a, c. d: Same as c for the G libraries
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4494700&req=5

Fig4: Read count cluster statistics and genotype quality scores for the 60-loci libraries. a: Mean versus standard deviation of clusters obtained with the E libraries. Black and blue circles indicate, respectively, low and high read count clusters. Despite locus-to-locus variations, most clusters had similar means and standard deviations. We manually set thresholds (represented as gray lines) at 3 (mean) and 0.5 (standard deviation), which dropped out locus 23 (low read count cluster had mean greater than 3), loci 50 and 57 (high read count cluster had mean less than 3) and locus 30 (standard deviation greater than 0.5). b: Same as a for the G libraries. We dropped locus 34 (high read count cluster had mean less than 3) and loci 3, 43 and 44 (standard deviation greater than 0.5). c: Histograms of genotype quality scores obtained for the E libraries. Scores below 7 (threshold indicated by a gray vertical line) are indicated as crosses in Fig. 3a, c. d: Same as c for the G libraries
Mentions: We used the position and spread of the low and high read count clusters to automatically spot loci with potential genotyping problems (Fig. 4a-b). Unusual cluster mean (excessively low mean for a high read count cluster or excessively high mean for a low read count cluster) signaled loci that did not amplify convincingly (e.g. locus 50 in Fig. 3a) or that failed clustering (e.g. locus 23 in Fig. 3a). We dropped 6 and 7 loci from the E and G libraries, respectively: For the E libraries, 4 loci showed poor clustering (Fig. 4a) and 1 locus had reads that did not map uniquely to the targeted site. For the G libraries, 4 loci showed poor clustering (Fig. 4b) and 2 loci had reads that did not map uniquely. One primer pair was found a posteriori not to work properly and was excluded from both libraries. In addition to characterizing each locus using statistical characteristics of the clusters, we also derived genotyping quality scores representing the confidence of each call given the observed read count and the underlying clusters (Fig. 4c-d). Loci where many samples showed low quality scores overlapped with loci dropped based on poor cluster characteristics (Fig. 3a, c).Fig. 4

Bottom Line: Like other structural variants, transposable element insertions can be highly polymorphic across individuals.Their functional impact, however, remains poorly understood.This method can benefit a wide range of applications from the routine genotyping of animal and plant populations to the functional study of structural variants in humans.

View Article: PubMed Central - PubMed

Affiliation: Microfluidics Systems Biology Lab, Institute of Molecular and Cell Biology, Agency for Science, Technology and Research (A*STAR), Proteos Building, Room #03-04, 61 Biopolis Drive, Singapore, 138673, Singapore. alexandre.m.kuhn@gmail.com.

ABSTRACT

Background: Like other structural variants, transposable element insertions can be highly polymorphic across individuals. Their functional impact, however, remains poorly understood. Current genome-wide approaches for genotyping insertion-site polymorphisms based on targeted or whole-genome sequencing remain very expensive and can lack accuracy, hence new large-scale genotyping methods are needed.

Results: We describe a high-throughput method for genotyping transposable element insertions and other types of structural variants that can be assayed by breakpoint PCR. The method relies on next-generation sequencing of multiplex, site-specific PCR amplification products and read count-based genotype calls. We show that this method is flexible, efficient (it does not require rounds of optimization), cost-effective and highly accurate.

Conclusions: This method can benefit a wide range of applications from the routine genotyping of animal and plant populations to the functional study of structural variants in humans.

No MeSH data available.


Related in: MedlinePlus