Limits...
Robust SNP genotyping by multiplex PCR and arrayed primer extension.

Podder M, Ruan J, Tripp BW, Chu ZE, Tebbutt SJ - BMC Med Genomics (2008)

Bottom Line: We obtained good results for 41 of the SNPs, with 99.8% genotype concordance with HapMap data, at an automated call rate of 94.9% (not including the 9 failed SNPs).We have shown our methods to be effective for robust multiplex SNP genotyping using APEX, with 100% call rate and >99.9% accuracy.We believe that such methodology may be useful in future point-of-care clinical diagnostic applications where accuracy and call rate are both paramount.

View Article: PubMed Central - HTML - PubMed

Affiliation: The James Hogg iCAPTURE Centre for Cardiovascular and Pulmonary Research, St, Paul's Hospital, University of British Columbia, Vancouver, BC, V6Z 1Y6, Canada. mpodder@mrl.ubc.ca

ABSTRACT

Background: Arrayed primer extension (APEX) is a microarray-based rapid minisequencing methodology that may have utility in 'personalized medicine' applications that involve genetic diagnostics of single nucleotide polymorphisms (SNPs). However, to date there have been few reports that objectively evaluate the assay completion rate, call rate and accuracy of APEX. We have further developed robust assay design, chemistry and analysis methodologies, and have sought to determine how effective APEX is in comparison to leading 'gold-standard' genotyping platforms. Our methods have been tested against industry-leading technologies in two blinded experiments based on Coriell DNA samples and SNP genotype data from the International HapMap Project.

Results: In the first experiment, we genotyped 50 SNPs across the entire 270 HapMap Coriell DNA sample set. For each Coriell sample, DNA template was amplified in a total of 7 multiplex PCRs prior to genotyping. We obtained good results for 41 of the SNPs, with 99.8% genotype concordance with HapMap data, at an automated call rate of 94.9% (not including the 9 failed SNPs). In the second experiment, involving modifications to the initial DNA amplification so that a single 50-plex PCR could be achieved, genotyping of the same 50 SNPs across each of 49 randomly chosen Coriell DNA samples allowed extremely robust 50-plex genotyping from as little as 5 ng of DNA, with 100% assay completion rate, 100% call rate and >99.9% accuracy.

Conclusion: We have shown our methods to be effective for robust multiplex SNP genotyping using APEX, with 100% call rate and >99.9% accuracy. We believe that such methodology may be useful in future point-of-care clinical diagnostic applications where accuracy and call rate are both paramount.

No MeSH data available.


Simple scatter plots for SNP rs12466929 (A/G) from 50-plex data set (this SNP is representative of the entire set of 50 HapMap SNPs). For each plot the x-axis represents signal values for X allele (A for this SNP) and the y-axis represents signal values for Y allele (G for this SNP). All values are in log scale. Magenta, green, blue and black coloured symbols denote the classes YY (GG), YX (AG), XX (AA) and NN (negative control samples), respectively. Plot (1) combines the two ASO-APEX Left probes (one for each allele); plot (2) combines the two ASO-APEX Right probes (one for each allele); plot (3) is for the APEX Left probe; plot (4) is for the APEX Right probe. All the classifiers except APEX Left (plot 3) give well separated genotype clusters for this SNP. Dynamic variable selection is able to automatically weight these LDA classifiers in such a way that the homozygous AA cluster in plot (3) (blue) is able to contribute to the final call for such genotypes, even though AG (green) and GG (magenta) genotype clusters overlap somewhat for this Left APEX probe. Additional file 11 shows four-panel scatter plots for all 50 SNPs from the 50-plex data set.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2266772&req=5

Figure 3: Simple scatter plots for SNP rs12466929 (A/G) from 50-plex data set (this SNP is representative of the entire set of 50 HapMap SNPs). For each plot the x-axis represents signal values for X allele (A for this SNP) and the y-axis represents signal values for Y allele (G for this SNP). All values are in log scale. Magenta, green, blue and black coloured symbols denote the classes YY (GG), YX (AG), XX (AA) and NN (negative control samples), respectively. Plot (1) combines the two ASO-APEX Left probes (one for each allele); plot (2) combines the two ASO-APEX Right probes (one for each allele); plot (3) is for the APEX Left probe; plot (4) is for the APEX Right probe. All the classifiers except APEX Left (plot 3) give well separated genotype clusters for this SNP. Dynamic variable selection is able to automatically weight these LDA classifiers in such a way that the homozygous AA cluster in plot (3) (blue) is able to contribute to the final call for such genotypes, even though AG (green) and GG (magenta) genotype clusters overlap somewhat for this Left APEX probe. Additional file 11 shows four-panel scatter plots for all 50 SNPs from the 50-plex data set.

Mentions: To further determine the effect of probe redundancy in our APEX methodology, we used LDA to reanalyze both data sets (original and 50-plex) but using non-redundant and partially-redundant probe-specific data [Additional files 8, 9 and 10]. Fig. 3 and Additional file 11 show simple four-panel scatter plots of the probe data for the 50-plex experiment. In particular, Fig. 3 represents the four separate scatter plots for the SNP rs12466929 corresponding to the four different probe chemistries: ASO.LEFT, ASO.RIGHT, APEX.LEFT and APEX.RIGHT. For each scatter plot, the three possible genotype clusters (previously known from the HapMap data set) are presented with three different colours: blue for allele 1 homozygous; magenta for allele 2 homozygous; and green for allele 1 and allele 2 heterozygous. For the SNP rs12466929, allele 1 is A and allele 2 is G, and the scatter plots are representative of the entire set of 50 HapMap SNPs. The four scatter plots indicate that three out of the four probe chemistries work perfectly well and produce well separable (informative) clusters corresponding to the three genotype classes (AA, AG and GG), whereas one probe chemistry, namely APEX.LEFT, fails to work properly and gives overlapping clusters for AG and GG genotype classes [plot (3) in Fig. 3]. Nevertheless, this probe chemistry gives a well separable cluster for the AA genotype class. This phenomenon conveys the point of considering each probe chemistry separately during the building of the genotype classification model, and in the next stage of the genotype calling algorithm, combining the four genotype models with proper weights adjusted dynamically with the quality of each of the four classifiers (four probe chemistries) specific to each SNP and sample. If all four probes failed to produce informative clusters, then our LDA-based genotype calling algorithm would flag that SNP as a failed SNP, which clearly is not the case for the SNP rs12466929. This is how the redundancy amongst our APEX based genotyping platform is captured through the proposed LDA-based genotype calling algorithm with dynamic variable selection. Viewing the four-panel scatter plots, we would also like to emphasize the point that for most of the SNPs the homozygous clusters show some significant signal intensities corresponding to the other allele, due to spectral overlap within the APEX fluorescent ddNTP chemistry, thus inducing background to the homozygous clusters. Particularly for this reason, we do not often see a homozygous cluster close to either of X- or Y-axes. Here, the aim is to compare the allele 1 and allele 2 signal intensities for the three possible genotype classes, and then assign a test sample to the appropriate class based on the prior knowledge of the available training set. We would also like to mention that the initial signal intensities corresponding to each allele for all four probe chemistries are converted into the log-scale in order to reduce the variability between several microarray slides.


Robust SNP genotyping by multiplex PCR and arrayed primer extension.

Podder M, Ruan J, Tripp BW, Chu ZE, Tebbutt SJ - BMC Med Genomics (2008)

Simple scatter plots for SNP rs12466929 (A/G) from 50-plex data set (this SNP is representative of the entire set of 50 HapMap SNPs). For each plot the x-axis represents signal values for X allele (A for this SNP) and the y-axis represents signal values for Y allele (G for this SNP). All values are in log scale. Magenta, green, blue and black coloured symbols denote the classes YY (GG), YX (AG), XX (AA) and NN (negative control samples), respectively. Plot (1) combines the two ASO-APEX Left probes (one for each allele); plot (2) combines the two ASO-APEX Right probes (one for each allele); plot (3) is for the APEX Left probe; plot (4) is for the APEX Right probe. All the classifiers except APEX Left (plot 3) give well separated genotype clusters for this SNP. Dynamic variable selection is able to automatically weight these LDA classifiers in such a way that the homozygous AA cluster in plot (3) (blue) is able to contribute to the final call for such genotypes, even though AG (green) and GG (magenta) genotype clusters overlap somewhat for this Left APEX probe. Additional file 11 shows four-panel scatter plots for all 50 SNPs from the 50-plex data set.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2266772&req=5

Figure 3: Simple scatter plots for SNP rs12466929 (A/G) from 50-plex data set (this SNP is representative of the entire set of 50 HapMap SNPs). For each plot the x-axis represents signal values for X allele (A for this SNP) and the y-axis represents signal values for Y allele (G for this SNP). All values are in log scale. Magenta, green, blue and black coloured symbols denote the classes YY (GG), YX (AG), XX (AA) and NN (negative control samples), respectively. Plot (1) combines the two ASO-APEX Left probes (one for each allele); plot (2) combines the two ASO-APEX Right probes (one for each allele); plot (3) is for the APEX Left probe; plot (4) is for the APEX Right probe. All the classifiers except APEX Left (plot 3) give well separated genotype clusters for this SNP. Dynamic variable selection is able to automatically weight these LDA classifiers in such a way that the homozygous AA cluster in plot (3) (blue) is able to contribute to the final call for such genotypes, even though AG (green) and GG (magenta) genotype clusters overlap somewhat for this Left APEX probe. Additional file 11 shows four-panel scatter plots for all 50 SNPs from the 50-plex data set.
Mentions: To further determine the effect of probe redundancy in our APEX methodology, we used LDA to reanalyze both data sets (original and 50-plex) but using non-redundant and partially-redundant probe-specific data [Additional files 8, 9 and 10]. Fig. 3 and Additional file 11 show simple four-panel scatter plots of the probe data for the 50-plex experiment. In particular, Fig. 3 represents the four separate scatter plots for the SNP rs12466929 corresponding to the four different probe chemistries: ASO.LEFT, ASO.RIGHT, APEX.LEFT and APEX.RIGHT. For each scatter plot, the three possible genotype clusters (previously known from the HapMap data set) are presented with three different colours: blue for allele 1 homozygous; magenta for allele 2 homozygous; and green for allele 1 and allele 2 heterozygous. For the SNP rs12466929, allele 1 is A and allele 2 is G, and the scatter plots are representative of the entire set of 50 HapMap SNPs. The four scatter plots indicate that three out of the four probe chemistries work perfectly well and produce well separable (informative) clusters corresponding to the three genotype classes (AA, AG and GG), whereas one probe chemistry, namely APEX.LEFT, fails to work properly and gives overlapping clusters for AG and GG genotype classes [plot (3) in Fig. 3]. Nevertheless, this probe chemistry gives a well separable cluster for the AA genotype class. This phenomenon conveys the point of considering each probe chemistry separately during the building of the genotype classification model, and in the next stage of the genotype calling algorithm, combining the four genotype models with proper weights adjusted dynamically with the quality of each of the four classifiers (four probe chemistries) specific to each SNP and sample. If all four probes failed to produce informative clusters, then our LDA-based genotype calling algorithm would flag that SNP as a failed SNP, which clearly is not the case for the SNP rs12466929. This is how the redundancy amongst our APEX based genotyping platform is captured through the proposed LDA-based genotype calling algorithm with dynamic variable selection. Viewing the four-panel scatter plots, we would also like to emphasize the point that for most of the SNPs the homozygous clusters show some significant signal intensities corresponding to the other allele, due to spectral overlap within the APEX fluorescent ddNTP chemistry, thus inducing background to the homozygous clusters. Particularly for this reason, we do not often see a homozygous cluster close to either of X- or Y-axes. Here, the aim is to compare the allele 1 and allele 2 signal intensities for the three possible genotype classes, and then assign a test sample to the appropriate class based on the prior knowledge of the available training set. We would also like to mention that the initial signal intensities corresponding to each allele for all four probe chemistries are converted into the log-scale in order to reduce the variability between several microarray slides.

Bottom Line: We obtained good results for 41 of the SNPs, with 99.8% genotype concordance with HapMap data, at an automated call rate of 94.9% (not including the 9 failed SNPs).We have shown our methods to be effective for robust multiplex SNP genotyping using APEX, with 100% call rate and >99.9% accuracy.We believe that such methodology may be useful in future point-of-care clinical diagnostic applications where accuracy and call rate are both paramount.

View Article: PubMed Central - HTML - PubMed

Affiliation: The James Hogg iCAPTURE Centre for Cardiovascular and Pulmonary Research, St, Paul's Hospital, University of British Columbia, Vancouver, BC, V6Z 1Y6, Canada. mpodder@mrl.ubc.ca

ABSTRACT

Background: Arrayed primer extension (APEX) is a microarray-based rapid minisequencing methodology that may have utility in 'personalized medicine' applications that involve genetic diagnostics of single nucleotide polymorphisms (SNPs). However, to date there have been few reports that objectively evaluate the assay completion rate, call rate and accuracy of APEX. We have further developed robust assay design, chemistry and analysis methodologies, and have sought to determine how effective APEX is in comparison to leading 'gold-standard' genotyping platforms. Our methods have been tested against industry-leading technologies in two blinded experiments based on Coriell DNA samples and SNP genotype data from the International HapMap Project.

Results: In the first experiment, we genotyped 50 SNPs across the entire 270 HapMap Coriell DNA sample set. For each Coriell sample, DNA template was amplified in a total of 7 multiplex PCRs prior to genotyping. We obtained good results for 41 of the SNPs, with 99.8% genotype concordance with HapMap data, at an automated call rate of 94.9% (not including the 9 failed SNPs). In the second experiment, involving modifications to the initial DNA amplification so that a single 50-plex PCR could be achieved, genotyping of the same 50 SNPs across each of 49 randomly chosen Coriell DNA samples allowed extremely robust 50-plex genotyping from as little as 5 ng of DNA, with 100% assay completion rate, 100% call rate and >99.9% accuracy.

Conclusion: We have shown our methods to be effective for robust multiplex SNP genotyping using APEX, with 100% call rate and >99.9% accuracy. We believe that such methodology may be useful in future point-of-care clinical diagnostic applications where accuracy and call rate are both paramount.

No MeSH data available.