Limits...
Genetic association mapping via evolution-based clustering of haplotypes.

Tachmazidou I, Verzilli CJ, De Iorio M - PLoS Genet. (2007)

Bottom Line: We compare the proposed approach to both single-marker analyses and recently proposed multi-marker methods and show that the Bayesian partition modelling performs similarly in localizing the causal allele while yielding lower false-positive rates.Also, the method is computationally quicker than other multi-marker approaches.We present an application to real genotype data from the CYP2D6 gene region, which has a confirmed role in drug metabolism, where we succeed in mapping the location of the susceptibility variant within a small error.

View Article: PubMed Central - PubMed

Affiliation: Department of Epidemiology and Public Health, Imperial College London, United Kingdom. ioanna.tachmazidou03@ic.ac.uk

ABSTRACT
Multilocus analysis of single nucleotide polymorphism haplotypes is a promising approach to dissecting the genetic basis of complex diseases. We propose a coalescent-based model for association mapping that potentially increases the power to detect disease-susceptibility variants in genetic association studies. The approach uses Bayesian partition modelling to cluster haplotypes with similar disease risks by exploiting evolutionary information. We focus on candidate gene regions with densely spaced markers and model chromosomal segments in high linkage disequilibrium therein assuming a perfect phylogeny. To make this assumption more realistic, we split the chromosomal region of interest into sub-regions or windows of high linkage disequilibrium. The haplotype space is then partitioned into disjoint clusters, within which the phenotype-haplotype association is assumed to be the same. For example, in case-control studies, we expect chromosomal segments bearing the causal variant on a common ancestral background to be more frequent among cases than controls, giving rise to two separate haplotype clusters. The novelty of our approach arises from the fact that the distance used for clustering haplotypes has an evolutionary interpretation, as haplotypes are clustered according to the time to their most recent common ancestor. Our approach is fully Bayesian and we develop a Markov Chain Monte Carlo algorithm to sample efficiently over the space of possible partitions. We compare the proposed approach to both single-marker analyses and recently proposed multi-marker methods and show that the Bayesian partition modelling performs similarly in localizing the causal allele while yielding lower false-positive rates. Also, the method is computationally quicker than other multi-marker approaches. We present an application to real genotype data from the CYP2D6 gene region, which has a confirmed role in drug metabolism, where we succeed in mapping the location of the susceptibility variant within a small error.

Show MeSH

Related in: MedlinePlus

The Gene Tree Consistent with the Haplotypes in the Incidence Matrix of Table 6Labels 1–12 refer to mutations S1–S12. At the bottom of each branch we report the multiplicity of each observed haplotype in the sample.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1913101&req=5

pgen-0030111-g009: The Gene Tree Consistent with the Haplotypes in the Incidence Matrix of Table 6Labels 1–12 refer to mutations S1–S12. At the bottom of each branch we report the multiplicity of each observed haplotype in the sample.

Mentions: It is possible to construct a gene tree when the perfect phylogeny condition is true for all pairs of SNPs of a study sample using, for example, Gusfield's algorithm [35,24]. Figure 9 shows the gene tree for the haplotypes in Table 9. The nodes in the tree correspond to mutations that have generated the segregating sites and the gene tree is rooted at the haplotype with all major alleles. Mutations are ordered on the tree according to their relative age. If the causal mutation is embedded between SNPs 1 and 7, all descendant haplotypes of that lineage will inherit it and, therefore, we expect that most case haplotypes are among the 308 haplotypes that correspond to the first three branches of the tree (first three lines of Table 9). Thus, in the region of the disease locus, a sample of case haplotypes tend to have a more-recent shared ancestry than do control haplotypes, because many of them share a recent disease mutation. Note, however, that sporadic cases due to phenocopies, dominance, and epistasis introduce substantial noise in the phenotype–haplotype relationship, which influences the relative frequencies of nonpenetrant case haplotypes carried by unaffected controls and control haplotypes carried by affected cases.


Genetic association mapping via evolution-based clustering of haplotypes.

Tachmazidou I, Verzilli CJ, De Iorio M - PLoS Genet. (2007)

The Gene Tree Consistent with the Haplotypes in the Incidence Matrix of Table 6Labels 1–12 refer to mutations S1–S12. At the bottom of each branch we report the multiplicity of each observed haplotype in the sample.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1913101&req=5

pgen-0030111-g009: The Gene Tree Consistent with the Haplotypes in the Incidence Matrix of Table 6Labels 1–12 refer to mutations S1–S12. At the bottom of each branch we report the multiplicity of each observed haplotype in the sample.
Mentions: It is possible to construct a gene tree when the perfect phylogeny condition is true for all pairs of SNPs of a study sample using, for example, Gusfield's algorithm [35,24]. Figure 9 shows the gene tree for the haplotypes in Table 9. The nodes in the tree correspond to mutations that have generated the segregating sites and the gene tree is rooted at the haplotype with all major alleles. Mutations are ordered on the tree according to their relative age. If the causal mutation is embedded between SNPs 1 and 7, all descendant haplotypes of that lineage will inherit it and, therefore, we expect that most case haplotypes are among the 308 haplotypes that correspond to the first three branches of the tree (first three lines of Table 9). Thus, in the region of the disease locus, a sample of case haplotypes tend to have a more-recent shared ancestry than do control haplotypes, because many of them share a recent disease mutation. Note, however, that sporadic cases due to phenocopies, dominance, and epistasis introduce substantial noise in the phenotype–haplotype relationship, which influences the relative frequencies of nonpenetrant case haplotypes carried by unaffected controls and control haplotypes carried by affected cases.

Bottom Line: We compare the proposed approach to both single-marker analyses and recently proposed multi-marker methods and show that the Bayesian partition modelling performs similarly in localizing the causal allele while yielding lower false-positive rates.Also, the method is computationally quicker than other multi-marker approaches.We present an application to real genotype data from the CYP2D6 gene region, which has a confirmed role in drug metabolism, where we succeed in mapping the location of the susceptibility variant within a small error.

View Article: PubMed Central - PubMed

Affiliation: Department of Epidemiology and Public Health, Imperial College London, United Kingdom. ioanna.tachmazidou03@ic.ac.uk

ABSTRACT
Multilocus analysis of single nucleotide polymorphism haplotypes is a promising approach to dissecting the genetic basis of complex diseases. We propose a coalescent-based model for association mapping that potentially increases the power to detect disease-susceptibility variants in genetic association studies. The approach uses Bayesian partition modelling to cluster haplotypes with similar disease risks by exploiting evolutionary information. We focus on candidate gene regions with densely spaced markers and model chromosomal segments in high linkage disequilibrium therein assuming a perfect phylogeny. To make this assumption more realistic, we split the chromosomal region of interest into sub-regions or windows of high linkage disequilibrium. The haplotype space is then partitioned into disjoint clusters, within which the phenotype-haplotype association is assumed to be the same. For example, in case-control studies, we expect chromosomal segments bearing the causal variant on a common ancestral background to be more frequent among cases than controls, giving rise to two separate haplotype clusters. The novelty of our approach arises from the fact that the distance used for clustering haplotypes has an evolutionary interpretation, as haplotypes are clustered according to the time to their most recent common ancestor. Our approach is fully Bayesian and we develop a Markov Chain Monte Carlo algorithm to sample efficiently over the space of possible partitions. We compare the proposed approach to both single-marker analyses and recently proposed multi-marker methods and show that the Bayesian partition modelling performs similarly in localizing the causal allele while yielding lower false-positive rates. Also, the method is computationally quicker than other multi-marker approaches. We present an application to real genotype data from the CYP2D6 gene region, which has a confirmed role in drug metabolism, where we succeed in mapping the location of the susceptibility variant within a small error.

Show MeSH
Related in: MedlinePlus