Limits...
Genetic association mapping via evolution-based clustering of haplotypes.

Tachmazidou I, Verzilli CJ, De Iorio M - PLoS Genet. (2007)

Bottom Line: We compare the proposed approach to both single-marker analyses and recently proposed multi-marker methods and show that the Bayesian partition modelling performs similarly in localizing the causal allele while yielding lower false-positive rates.Also, the method is computationally quicker than other multi-marker approaches.We present an application to real genotype data from the CYP2D6 gene region, which has a confirmed role in drug metabolism, where we succeed in mapping the location of the susceptibility variant within a small error.

View Article: PubMed Central - PubMed

Affiliation: Department of Epidemiology and Public Health, Imperial College London, United Kingdom. ioanna.tachmazidou03@ic.ac.uk

ABSTRACT
Multilocus analysis of single nucleotide polymorphism haplotypes is a promising approach to dissecting the genetic basis of complex diseases. We propose a coalescent-based model for association mapping that potentially increases the power to detect disease-susceptibility variants in genetic association studies. The approach uses Bayesian partition modelling to cluster haplotypes with similar disease risks by exploiting evolutionary information. We focus on candidate gene regions with densely spaced markers and model chromosomal segments in high linkage disequilibrium therein assuming a perfect phylogeny. To make this assumption more realistic, we split the chromosomal region of interest into sub-regions or windows of high linkage disequilibrium. The haplotype space is then partitioned into disjoint clusters, within which the phenotype-haplotype association is assumed to be the same. For example, in case-control studies, we expect chromosomal segments bearing the causal variant on a common ancestral background to be more frequent among cases than controls, giving rise to two separate haplotype clusters. The novelty of our approach arises from the fact that the distance used for clustering haplotypes has an evolutionary interpretation, as haplotypes are clustered according to the time to their most recent common ancestor. Our approach is fully Bayesian and we develop a Markov Chain Monte Carlo algorithm to sample efficiently over the space of possible partitions. We compare the proposed approach to both single-marker analyses and recently proposed multi-marker methods and show that the Bayesian partition modelling performs similarly in localizing the causal allele while yielding lower false-positive rates. Also, the method is computationally quicker than other multi-marker approaches. We present an application to real genotype data from the CYP2D6 gene region, which has a confirmed role in drug metabolism, where we succeed in mapping the location of the susceptibility variant within a small error.

Show MeSH

Related in: MedlinePlus

Power for a Range of ModelsProbability of a significant signal within 100 kb of the causal allele. Each point on the x-axis corresponds to 50 datasets under each of the simulation parameters while keeping the rest at their default values. The two points that do not belong to a line correspond to the default scenario for Margarita markerwise p-values calculated by permutation and Margarita experimentwise p-values calculated by permutation. For “BETA strong signal” and “BETA decisive signal” we consider markers with Bayes factors ≥10 and ≥150, respectively.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1913101&req=5

pgen-0030111-g006: Power for a Range of ModelsProbability of a significant signal within 100 kb of the causal allele. Each point on the x-axis corresponds to 50 datasets under each of the simulation parameters while keeping the rest at their default values. The two points that do not belong to a line correspond to the default scenario for Margarita markerwise p-values calculated by permutation and Margarita experimentwise p-values calculated by permutation. For “BETA strong signal” and “BETA decisive signal” we consider markers with Bayes factors ≥10 and ≥150, respectively.

Mentions: To compare the power of the different methods, we define a window of 100 kb on either side of the causative allele and calculate the proportion of the 50 replicates yielding a significant association within the window, as in Minichiello and Durbin [16]. The significance of a signal is assessed using the rules described in the previous paragraph. Figure 6 shows the probability of detecting a significant association within 100 kb of the causal SNP under various scenarios and over the 50 replicates. In each plot, we vary a simulation parameter along the x-axis while assuming default values for the remaining ones. As mentioned earlier, Margarita was run only for the default scenario. We were unable to obtain results from HAPCLUSTER, as this method does not give markerwise measures of association. From the results in Figure 6, BETA using the strong rule has more power than both the single locus approach and Margarita (default scenario only) with multiplicity-corrected results by permutation, and slightly less power than plain Margarita. Uncorrected single locus test is the most powerful approach, having, however, the worst performance in terms of false positives.


Genetic association mapping via evolution-based clustering of haplotypes.

Tachmazidou I, Verzilli CJ, De Iorio M - PLoS Genet. (2007)

Power for a Range of ModelsProbability of a significant signal within 100 kb of the causal allele. Each point on the x-axis corresponds to 50 datasets under each of the simulation parameters while keeping the rest at their default values. The two points that do not belong to a line correspond to the default scenario for Margarita markerwise p-values calculated by permutation and Margarita experimentwise p-values calculated by permutation. For “BETA strong signal” and “BETA decisive signal” we consider markers with Bayes factors ≥10 and ≥150, respectively.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1913101&req=5

pgen-0030111-g006: Power for a Range of ModelsProbability of a significant signal within 100 kb of the causal allele. Each point on the x-axis corresponds to 50 datasets under each of the simulation parameters while keeping the rest at their default values. The two points that do not belong to a line correspond to the default scenario for Margarita markerwise p-values calculated by permutation and Margarita experimentwise p-values calculated by permutation. For “BETA strong signal” and “BETA decisive signal” we consider markers with Bayes factors ≥10 and ≥150, respectively.
Mentions: To compare the power of the different methods, we define a window of 100 kb on either side of the causative allele and calculate the proportion of the 50 replicates yielding a significant association within the window, as in Minichiello and Durbin [16]. The significance of a signal is assessed using the rules described in the previous paragraph. Figure 6 shows the probability of detecting a significant association within 100 kb of the causal SNP under various scenarios and over the 50 replicates. In each plot, we vary a simulation parameter along the x-axis while assuming default values for the remaining ones. As mentioned earlier, Margarita was run only for the default scenario. We were unable to obtain results from HAPCLUSTER, as this method does not give markerwise measures of association. From the results in Figure 6, BETA using the strong rule has more power than both the single locus approach and Margarita (default scenario only) with multiplicity-corrected results by permutation, and slightly less power than plain Margarita. Uncorrected single locus test is the most powerful approach, having, however, the worst performance in terms of false positives.

Bottom Line: We compare the proposed approach to both single-marker analyses and recently proposed multi-marker methods and show that the Bayesian partition modelling performs similarly in localizing the causal allele while yielding lower false-positive rates.Also, the method is computationally quicker than other multi-marker approaches.We present an application to real genotype data from the CYP2D6 gene region, which has a confirmed role in drug metabolism, where we succeed in mapping the location of the susceptibility variant within a small error.

View Article: PubMed Central - PubMed

Affiliation: Department of Epidemiology and Public Health, Imperial College London, United Kingdom. ioanna.tachmazidou03@ic.ac.uk

ABSTRACT
Multilocus analysis of single nucleotide polymorphism haplotypes is a promising approach to dissecting the genetic basis of complex diseases. We propose a coalescent-based model for association mapping that potentially increases the power to detect disease-susceptibility variants in genetic association studies. The approach uses Bayesian partition modelling to cluster haplotypes with similar disease risks by exploiting evolutionary information. We focus on candidate gene regions with densely spaced markers and model chromosomal segments in high linkage disequilibrium therein assuming a perfect phylogeny. To make this assumption more realistic, we split the chromosomal region of interest into sub-regions or windows of high linkage disequilibrium. The haplotype space is then partitioned into disjoint clusters, within which the phenotype-haplotype association is assumed to be the same. For example, in case-control studies, we expect chromosomal segments bearing the causal variant on a common ancestral background to be more frequent among cases than controls, giving rise to two separate haplotype clusters. The novelty of our approach arises from the fact that the distance used for clustering haplotypes has an evolutionary interpretation, as haplotypes are clustered according to the time to their most recent common ancestor. Our approach is fully Bayesian and we develop a Markov Chain Monte Carlo algorithm to sample efficiently over the space of possible partitions. We compare the proposed approach to both single-marker analyses and recently proposed multi-marker methods and show that the Bayesian partition modelling performs similarly in localizing the causal allele while yielding lower false-positive rates. Also, the method is computationally quicker than other multi-marker approaches. We present an application to real genotype data from the CYP2D6 gene region, which has a confirmed role in drug metabolism, where we succeed in mapping the location of the susceptibility variant within a small error.

Show MeSH
Related in: MedlinePlus