Limits...
The effects of linkage disequilibrium in large scale SNP datasets for MDR.

Grady BJ, Torstenson ES, Ritchie MD - BioData Min (2011)

Bottom Line: In this study, we examined the effect of LD on the sensitivity of the Multifactor Dimensionality Reduction (MDR) software package.Higher levels of LD begin to confound the MDR algorithm and lead to a drop in sensitivity with respect to the identification of a direct association; it does not, however, affect the ability to detect indirect association.As such, the results of MDR analysis in datasets with LD should be carefully examined to consider the underlying LD structure of the dataset.

View Article: PubMed Central - HTML - PubMed

Affiliation: Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, TN 37232, USA. ritchie@chgr.mc.vanderbilt.edu.

ABSTRACT

Background: In the analysis of large-scale genomic datasets, an important consideration is the power of analytical methods to identify accurate predictive models of disease. When trying to assess sensitivity from such analytical methods, a confounding factor up to this point has been the presence of linkage disequilibrium (LD). In this study, we examined the effect of LD on the sensitivity of the Multifactor Dimensionality Reduction (MDR) software package.

Results: Four relative amounts of LD were simulated in multiple one- and two-locus scenarios for which the position of the functional SNP(s) within LD blocks varied. Simulated data was analyzed with MDR to determine the sensitivity of the method in different contexts, where the sensitivity of the method was gauged as the number of times out of 100 that the method identifies the correct one- or two-locus model as the best overall model. As the amount of LD increases, the sensitivity of MDR to detect the correct functional SNP drops but the sensitivity to detect the disease signal and find an indirect association increases.

Conclusions: Higher levels of LD begin to confound the MDR algorithm and lead to a drop in sensitivity with respect to the identification of a direct association; it does not, however, affect the ability to detect indirect association. Careful examination of the solution models generated by MDR reveals that MDR can identify loci in the correct LD block; though it is not always the functional SNP. As such, the results of MDR analysis in datasets with LD should be carefully examined to consider the underlying LD structure of the dataset.

No MeSH data available.


Related in: MedlinePlus

Data pools of differing LD amounts. The four different data pools used to generate data. A) 40% LD B) 60% LD C) 80% LD D) 95% LD.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3108918&req=5

Figure 2: Data pools of differing LD amounts. The four different data pools used to generate data. A) 40% LD B) 60% LD C) 80% LD D) 95% LD.

Mentions: Multifactor Dimensionality Reduction (MDR) is an analysis method designed to detect multi-locus interactions in large datasets. The MDR algorithm searches exhaustively among a set of categorical variables such as genotypes for interactions up to a specified order of degree (e.g. 2-way, 3-way). For each interaction model up to the specified level of complexity, each intersection of the potential values for the variables in the model is labeled as high risk or low risk based on the ratio of cases to controls who possess the intersection of values for the variables under examination (e.g. the AABB multi-locus genotype). The accuracy of this classification is then used as a metric of model importance, including the predictive value of the model in unseen data through the use of N-fold cross validation [3]. The design of the MDR algorithm renders it capable of detecting even those interactions for which the categorical variables have no detectable marginal effects. MDR has been used frequently in the field of genetic epidemiology - in the study of diseases such as Breast Cancer [3-5], Schizophrenia [6], and Type 2 Diabetes [7] - to search for interactions between single nucleotide polymorphisms (SNPs) implicating biological interactions of etiological significance. The performance of MDR has been examined in the presence of genetic heterogeneity, phenocopy and missing data but not in the presence of LD [8]. The goal of this study was to determine the sensitivity of MDR to detect the disease signal of functional loci in varying amounts of LD. Data ranging from low to high LD amounts were simulated using a forward-time genomic simulator (Figure 2). Cases and controls were subsequently drawn to by taking two chromosomes from a pool of simulated chromosomes and applying a penetrance function describing the probability of disease given the single- or multi-locus genotype present at the functional variant(s). The functional variants responsible for disease etiology were chosen to satisfy requirements of LD structure with surrounding SNPs and allele frequency (Figure 3). One hundred datasets with 1000 cases and 1000 controls were generated for each model. The resulting datasets were then analyzed with MDR and the sensitivity of the method was measured.


The effects of linkage disequilibrium in large scale SNP datasets for MDR.

Grady BJ, Torstenson ES, Ritchie MD - BioData Min (2011)

Data pools of differing LD amounts. The four different data pools used to generate data. A) 40% LD B) 60% LD C) 80% LD D) 95% LD.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3108918&req=5

Figure 2: Data pools of differing LD amounts. The four different data pools used to generate data. A) 40% LD B) 60% LD C) 80% LD D) 95% LD.
Mentions: Multifactor Dimensionality Reduction (MDR) is an analysis method designed to detect multi-locus interactions in large datasets. The MDR algorithm searches exhaustively among a set of categorical variables such as genotypes for interactions up to a specified order of degree (e.g. 2-way, 3-way). For each interaction model up to the specified level of complexity, each intersection of the potential values for the variables in the model is labeled as high risk or low risk based on the ratio of cases to controls who possess the intersection of values for the variables under examination (e.g. the AABB multi-locus genotype). The accuracy of this classification is then used as a metric of model importance, including the predictive value of the model in unseen data through the use of N-fold cross validation [3]. The design of the MDR algorithm renders it capable of detecting even those interactions for which the categorical variables have no detectable marginal effects. MDR has been used frequently in the field of genetic epidemiology - in the study of diseases such as Breast Cancer [3-5], Schizophrenia [6], and Type 2 Diabetes [7] - to search for interactions between single nucleotide polymorphisms (SNPs) implicating biological interactions of etiological significance. The performance of MDR has been examined in the presence of genetic heterogeneity, phenocopy and missing data but not in the presence of LD [8]. The goal of this study was to determine the sensitivity of MDR to detect the disease signal of functional loci in varying amounts of LD. Data ranging from low to high LD amounts were simulated using a forward-time genomic simulator (Figure 2). Cases and controls were subsequently drawn to by taking two chromosomes from a pool of simulated chromosomes and applying a penetrance function describing the probability of disease given the single- or multi-locus genotype present at the functional variant(s). The functional variants responsible for disease etiology were chosen to satisfy requirements of LD structure with surrounding SNPs and allele frequency (Figure 3). One hundred datasets with 1000 cases and 1000 controls were generated for each model. The resulting datasets were then analyzed with MDR and the sensitivity of the method was measured.

Bottom Line: In this study, we examined the effect of LD on the sensitivity of the Multifactor Dimensionality Reduction (MDR) software package.Higher levels of LD begin to confound the MDR algorithm and lead to a drop in sensitivity with respect to the identification of a direct association; it does not, however, affect the ability to detect indirect association.As such, the results of MDR analysis in datasets with LD should be carefully examined to consider the underlying LD structure of the dataset.

View Article: PubMed Central - HTML - PubMed

Affiliation: Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, TN 37232, USA. ritchie@chgr.mc.vanderbilt.edu.

ABSTRACT

Background: In the analysis of large-scale genomic datasets, an important consideration is the power of analytical methods to identify accurate predictive models of disease. When trying to assess sensitivity from such analytical methods, a confounding factor up to this point has been the presence of linkage disequilibrium (LD). In this study, we examined the effect of LD on the sensitivity of the Multifactor Dimensionality Reduction (MDR) software package.

Results: Four relative amounts of LD were simulated in multiple one- and two-locus scenarios for which the position of the functional SNP(s) within LD blocks varied. Simulated data was analyzed with MDR to determine the sensitivity of the method in different contexts, where the sensitivity of the method was gauged as the number of times out of 100 that the method identifies the correct one- or two-locus model as the best overall model. As the amount of LD increases, the sensitivity of MDR to detect the correct functional SNP drops but the sensitivity to detect the disease signal and find an indirect association increases.

Conclusions: Higher levels of LD begin to confound the MDR algorithm and lead to a drop in sensitivity with respect to the identification of a direct association; it does not, however, affect the ability to detect indirect association. Careful examination of the solution models generated by MDR reveals that MDR can identify loci in the correct LD block; though it is not always the functional SNP. As such, the results of MDR analysis in datasets with LD should be carefully examined to consider the underlying LD structure of the dataset.

No MeSH data available.


Related in: MedlinePlus