Limits...
Read count-based method for high-throughput allelic genotyping of transposable elements and structural variants.

Kuhn A, Ong YM, Quake SR, Burkholder WF - BMC Genomics (2015)

Bottom Line: Like other structural variants, transposable element insertions can be highly polymorphic across individuals.Their functional impact, however, remains poorly understood.This method can benefit a wide range of applications from the routine genotyping of animal and plant populations to the functional study of structural variants in humans.

View Article: PubMed Central - PubMed

Affiliation: Microfluidics Systems Biology Lab, Institute of Molecular and Cell Biology, Agency for Science, Technology and Research (A*STAR), Proteos Building, Room #03-04, 61 Biopolis Drive, Singapore, 138673, Singapore. alexandre.m.kuhn@gmail.com.

ABSTRACT

Background: Like other structural variants, transposable element insertions can be highly polymorphic across individuals. Their functional impact, however, remains poorly understood. Current genome-wide approaches for genotyping insertion-site polymorphisms based on targeted or whole-genome sequencing remain very expensive and can lack accuracy, hence new large-scale genotyping methods are needed.

Results: We describe a high-throughput method for genotyping transposable element insertions and other types of structural variants that can be assayed by breakpoint PCR. The method relies on next-generation sequencing of multiplex, site-specific PCR amplification products and read count-based genotype calls. We show that this method is flexible, efficient (it does not require rounds of optimization), cost-effective and highly accurate.

Conclusions: This method can benefit a wide range of applications from the routine genotyping of animal and plant populations to the functional study of structural variants in humans.

No MeSH data available.


Related in: MedlinePlus

Read counts, automatic genotype calls and validation results for the 60-loci libraries. a: Specific read counts for E reactions for 12 samples at each of 60 L1 loci. Blue and black circles represent, respectively, the present and absent calls made based on the clustering of read counts. Crosses indicate genotypes with a quality score less than 7. Triangles (locus 28) indicate genotypes that would be called “present” (blue) because of high read count but that were called “absent” because the L1 sequence was detected in the reads (in the case of very short L1 insertions). b: Specific read counts obtained for E reactions for loci that passed quality control. Green and red circles indicate, respectively, concordant and discordant calls for 25 loci that were validated individually using single-locus PCR reactions and gel electrophoresis. All calls were concordant. Locus 41 was excluded from the analysis because the primers did not work. c: Same as a but for the G libraries. d: Same as b for 23 loci that passed quality control and that were individually validated. All calls were concordant
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4494700&req=5

Fig3: Read counts, automatic genotype calls and validation results for the 60-loci libraries. a: Specific read counts for E reactions for 12 samples at each of 60 L1 loci. Blue and black circles represent, respectively, the present and absent calls made based on the clustering of read counts. Crosses indicate genotypes with a quality score less than 7. Triangles (locus 28) indicate genotypes that would be called “present” (blue) because of high read count but that were called “absent” because the L1 sequence was detected in the reads (in the case of very short L1 insertions). b: Specific read counts obtained for E reactions for loci that passed quality control. Green and red circles indicate, respectively, concordant and discordant calls for 25 loci that were validated individually using single-locus PCR reactions and gel electrophoresis. All calls were concordant. Locus 41 was excluded from the analysis because the primers did not work. c: Same as a but for the G libraries. d: Same as b for 23 loci that passed quality control and that were individually validated. All calls were concordant

Mentions: The specific read counts obtained for E reactions at each locus clearly clustered into two groups (Fig. 3a). The high and low read count clusters comprised, respectively, samples in which the targeted L1 insertion was absent on at least one allele (high), or present on both alleles (low). For most loci, the separation was more than 2 log10 units. G reactions also yielded well separated clusters (Fig. 3c). Here, the high and low read count clusters comprised, respectively, samples in which the L1-bearing allele was present at least once (high) or was absent (low). The exact position of both clusters varied from locus to locus, owing to systematic differences in PCR amplification efficiency. We implemented a locus-specific, unsupervised clustering method to obtain automatic genotype calls (blue and black symbols in Fig. 3a, c).Fig. 3


Read count-based method for high-throughput allelic genotyping of transposable elements and structural variants.

Kuhn A, Ong YM, Quake SR, Burkholder WF - BMC Genomics (2015)

Read counts, automatic genotype calls and validation results for the 60-loci libraries. a: Specific read counts for E reactions for 12 samples at each of 60 L1 loci. Blue and black circles represent, respectively, the present and absent calls made based on the clustering of read counts. Crosses indicate genotypes with a quality score less than 7. Triangles (locus 28) indicate genotypes that would be called “present” (blue) because of high read count but that were called “absent” because the L1 sequence was detected in the reads (in the case of very short L1 insertions). b: Specific read counts obtained for E reactions for loci that passed quality control. Green and red circles indicate, respectively, concordant and discordant calls for 25 loci that were validated individually using single-locus PCR reactions and gel electrophoresis. All calls were concordant. Locus 41 was excluded from the analysis because the primers did not work. c: Same as a but for the G libraries. d: Same as b for 23 loci that passed quality control and that were individually validated. All calls were concordant
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4494700&req=5

Fig3: Read counts, automatic genotype calls and validation results for the 60-loci libraries. a: Specific read counts for E reactions for 12 samples at each of 60 L1 loci. Blue and black circles represent, respectively, the present and absent calls made based on the clustering of read counts. Crosses indicate genotypes with a quality score less than 7. Triangles (locus 28) indicate genotypes that would be called “present” (blue) because of high read count but that were called “absent” because the L1 sequence was detected in the reads (in the case of very short L1 insertions). b: Specific read counts obtained for E reactions for loci that passed quality control. Green and red circles indicate, respectively, concordant and discordant calls for 25 loci that were validated individually using single-locus PCR reactions and gel electrophoresis. All calls were concordant. Locus 41 was excluded from the analysis because the primers did not work. c: Same as a but for the G libraries. d: Same as b for 23 loci that passed quality control and that were individually validated. All calls were concordant
Mentions: The specific read counts obtained for E reactions at each locus clearly clustered into two groups (Fig. 3a). The high and low read count clusters comprised, respectively, samples in which the targeted L1 insertion was absent on at least one allele (high), or present on both alleles (low). For most loci, the separation was more than 2 log10 units. G reactions also yielded well separated clusters (Fig. 3c). Here, the high and low read count clusters comprised, respectively, samples in which the L1-bearing allele was present at least once (high) or was absent (low). The exact position of both clusters varied from locus to locus, owing to systematic differences in PCR amplification efficiency. We implemented a locus-specific, unsupervised clustering method to obtain automatic genotype calls (blue and black symbols in Fig. 3a, c).Fig. 3

Bottom Line: Like other structural variants, transposable element insertions can be highly polymorphic across individuals.Their functional impact, however, remains poorly understood.This method can benefit a wide range of applications from the routine genotyping of animal and plant populations to the functional study of structural variants in humans.

View Article: PubMed Central - PubMed

Affiliation: Microfluidics Systems Biology Lab, Institute of Molecular and Cell Biology, Agency for Science, Technology and Research (A*STAR), Proteos Building, Room #03-04, 61 Biopolis Drive, Singapore, 138673, Singapore. alexandre.m.kuhn@gmail.com.

ABSTRACT

Background: Like other structural variants, transposable element insertions can be highly polymorphic across individuals. Their functional impact, however, remains poorly understood. Current genome-wide approaches for genotyping insertion-site polymorphisms based on targeted or whole-genome sequencing remain very expensive and can lack accuracy, hence new large-scale genotyping methods are needed.

Results: We describe a high-throughput method for genotyping transposable element insertions and other types of structural variants that can be assayed by breakpoint PCR. The method relies on next-generation sequencing of multiplex, site-specific PCR amplification products and read count-based genotype calls. We show that this method is flexible, efficient (it does not require rounds of optimization), cost-effective and highly accurate.

Conclusions: This method can benefit a wide range of applications from the routine genotyping of animal and plant populations to the functional study of structural variants in humans.

No MeSH data available.


Related in: MedlinePlus