Limits...
Inferring haplotypes at the NAT2 locus: the computational approach.

Sabbagh A, Darlu P - BMC Genet. (2005)

Bottom Line: However, molecular haplotyping methods are labour-intensive and expensive and do not appear to be good candidates for routine clinical applications.We empirically evaluated the effectiveness of four haplotyping algorithms in predicting haplotype phases at NAT2, by comparing the results with those directly obtained through molecular haplotyping.This investigation provides a solid basis for the confident and rational use of computational methods which appear to be a good alternative to infer haplotype phases in the particular case of the NAT2 gene, where there is near complete linkage disequilibrium between polymorphic markers.

View Article: PubMed Central - HTML - PubMed

Affiliation: Unité de Recherche en Génétique Epidémiologique et Structure des Populations Humaines, INSERM U535, Villejuif, France. sabbagh@vjf.inserm.fr

ABSTRACT

Background: Numerous studies have attempted to relate genetic polymorphisms within the N-acetyltransferase 2 gene (NAT2) to interindividual differences in response to drugs or in disease susceptibility. However, genotyping of individuals single-nucleotide polymorphisms (SNPs) alone may not always provide enough information to reach these goals. It is important to link SNPs in terms of haplotypes which carry more information about the genotype-phenotype relationship. Special analytical techniques have been designed to unequivocally determine the allocation of mutations to either DNA strand. However, molecular haplotyping methods are labour-intensive and expensive and do not appear to be good candidates for routine clinical applications. A cheap and relatively straightforward alternative is the use of computational algorithms. The objective of this study was to assess the performance of the computational approach in NAT2 haplotype reconstruction from phase-unknown genotype data, for population samples of various ethnic origin.

Results: We empirically evaluated the effectiveness of four haplotyping algorithms in predicting haplotype phases at NAT2, by comparing the results with those directly obtained through molecular haplotyping. All computational methods provided remarkably accurate and reliable estimates for NAT2 haplotype frequencies and individual haplotype phases. The Bayesian algorithm implemented in the PHASE program performed the best.

Conclusion: This investigation provides a solid basis for the confident and rational use of computational methods which appear to be a good alternative to infer haplotype phases in the particular case of the NAT2 gene, where there is near complete linkage disequilibrium between polymorphic markers.

Show MeSH
Linkage disequilibrium (r2 value) between SNP markers in the NAT2 locus. Graphical representation of the disequilibrium matrices obtained through computation of the r2 coefficient between each pair of markers, for the Spanish, Korean and Black South African samples. The British and Nicaraguan samples provided patterns and levels of LD comparable to those of the Spanish data. For each marker pair, GOLD [60] plotted the color-coded pairwise r2 statistics at the Cartesian coordinates corresponding to marker location, and the plots were completed by interpolation. These graphs point out the strong level of LD between markers at positions 341, 481, and 803, as well as between SNPs located at 282 and 590: these markers are thus strongly predictive of one another. In Black South Africans, LD patterns are less pronounced and more diffuse across marker pairs.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1173101&req=5

Figure 2: Linkage disequilibrium (r2 value) between SNP markers in the NAT2 locus. Graphical representation of the disequilibrium matrices obtained through computation of the r2 coefficient between each pair of markers, for the Spanish, Korean and Black South African samples. The British and Nicaraguan samples provided patterns and levels of LD comparable to those of the Spanish data. For each marker pair, GOLD [60] plotted the color-coded pairwise r2 statistics at the Cartesian coordinates corresponding to marker location, and the plots were completed by interpolation. These graphs point out the strong level of LD between markers at positions 341, 481, and 803, as well as between SNPs located at 282 and 590: these markers are thus strongly predictive of one another. In Black South Africans, LD patterns are less pronounced and more diffuse across marker pairs.

Mentions: Molecular haplotyping of the NAT2 locus revealed between eight and twelve distinct haplotypes in each of the five population samples investigated. The theoretical maximum number of haplotypes for a set of seven biallelic variable sites is 128 (27) if there is random association between polymorphic sites, whereas it is only 8 in the absence of recombination, recurrent and back mutation. Hence, the small number of haplotypes observed at NAT2 suggests strong LD over the short physical distance spanning this gene. Indeed, we observed complete or near complete LD for all pairs of SNPs with sufficiently high frequencies (only alleles with frequencies in the range 0.05–0.95 were included in the analysis because estimates of LD for low-frequency alleles in small samples are not informative): 85% of all pairwise r2 values were highly significant (Exact p-value <0.0001). Although LD patterns were rather similar among the different population samples, substantial differences in LD levels were observed (Figure 2): the Korean sample, and especially the South African sample, displayed much smaller values of average pairwise r2 (0.27 and 0.20, respectively) than the two European and the Nicaraguan samples (values between 0.39 and 0.57), for which a strong haplotypic structure was observed.


Inferring haplotypes at the NAT2 locus: the computational approach.

Sabbagh A, Darlu P - BMC Genet. (2005)

Linkage disequilibrium (r2 value) between SNP markers in the NAT2 locus. Graphical representation of the disequilibrium matrices obtained through computation of the r2 coefficient between each pair of markers, for the Spanish, Korean and Black South African samples. The British and Nicaraguan samples provided patterns and levels of LD comparable to those of the Spanish data. For each marker pair, GOLD [60] plotted the color-coded pairwise r2 statistics at the Cartesian coordinates corresponding to marker location, and the plots were completed by interpolation. These graphs point out the strong level of LD between markers at positions 341, 481, and 803, as well as between SNPs located at 282 and 590: these markers are thus strongly predictive of one another. In Black South Africans, LD patterns are less pronounced and more diffuse across marker pairs.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1173101&req=5

Figure 2: Linkage disequilibrium (r2 value) between SNP markers in the NAT2 locus. Graphical representation of the disequilibrium matrices obtained through computation of the r2 coefficient between each pair of markers, for the Spanish, Korean and Black South African samples. The British and Nicaraguan samples provided patterns and levels of LD comparable to those of the Spanish data. For each marker pair, GOLD [60] plotted the color-coded pairwise r2 statistics at the Cartesian coordinates corresponding to marker location, and the plots were completed by interpolation. These graphs point out the strong level of LD between markers at positions 341, 481, and 803, as well as between SNPs located at 282 and 590: these markers are thus strongly predictive of one another. In Black South Africans, LD patterns are less pronounced and more diffuse across marker pairs.
Mentions: Molecular haplotyping of the NAT2 locus revealed between eight and twelve distinct haplotypes in each of the five population samples investigated. The theoretical maximum number of haplotypes for a set of seven biallelic variable sites is 128 (27) if there is random association between polymorphic sites, whereas it is only 8 in the absence of recombination, recurrent and back mutation. Hence, the small number of haplotypes observed at NAT2 suggests strong LD over the short physical distance spanning this gene. Indeed, we observed complete or near complete LD for all pairs of SNPs with sufficiently high frequencies (only alleles with frequencies in the range 0.05–0.95 were included in the analysis because estimates of LD for low-frequency alleles in small samples are not informative): 85% of all pairwise r2 values were highly significant (Exact p-value <0.0001). Although LD patterns were rather similar among the different population samples, substantial differences in LD levels were observed (Figure 2): the Korean sample, and especially the South African sample, displayed much smaller values of average pairwise r2 (0.27 and 0.20, respectively) than the two European and the Nicaraguan samples (values between 0.39 and 0.57), for which a strong haplotypic structure was observed.

Bottom Line: However, molecular haplotyping methods are labour-intensive and expensive and do not appear to be good candidates for routine clinical applications.We empirically evaluated the effectiveness of four haplotyping algorithms in predicting haplotype phases at NAT2, by comparing the results with those directly obtained through molecular haplotyping.This investigation provides a solid basis for the confident and rational use of computational methods which appear to be a good alternative to infer haplotype phases in the particular case of the NAT2 gene, where there is near complete linkage disequilibrium between polymorphic markers.

View Article: PubMed Central - HTML - PubMed

Affiliation: Unité de Recherche en Génétique Epidémiologique et Structure des Populations Humaines, INSERM U535, Villejuif, France. sabbagh@vjf.inserm.fr

ABSTRACT

Background: Numerous studies have attempted to relate genetic polymorphisms within the N-acetyltransferase 2 gene (NAT2) to interindividual differences in response to drugs or in disease susceptibility. However, genotyping of individuals single-nucleotide polymorphisms (SNPs) alone may not always provide enough information to reach these goals. It is important to link SNPs in terms of haplotypes which carry more information about the genotype-phenotype relationship. Special analytical techniques have been designed to unequivocally determine the allocation of mutations to either DNA strand. However, molecular haplotyping methods are labour-intensive and expensive and do not appear to be good candidates for routine clinical applications. A cheap and relatively straightforward alternative is the use of computational algorithms. The objective of this study was to assess the performance of the computational approach in NAT2 haplotype reconstruction from phase-unknown genotype data, for population samples of various ethnic origin.

Results: We empirically evaluated the effectiveness of four haplotyping algorithms in predicting haplotype phases at NAT2, by comparing the results with those directly obtained through molecular haplotyping. All computational methods provided remarkably accurate and reliable estimates for NAT2 haplotype frequencies and individual haplotype phases. The Bayesian algorithm implemented in the PHASE program performed the best.

Conclusion: This investigation provides a solid basis for the confident and rational use of computational methods which appear to be a good alternative to infer haplotype phases in the particular case of the NAT2 gene, where there is near complete linkage disequilibrium between polymorphic markers.

Show MeSH