Limits...
The effects of linkage disequilibrium in large scale SNP datasets for MDR.

Grady BJ, Torstenson ES, Ritchie MD - BioData Min (2011)

Bottom Line: In this study, we examined the effect of LD on the sensitivity of the Multifactor Dimensionality Reduction (MDR) software package.Higher levels of LD begin to confound the MDR algorithm and lead to a drop in sensitivity with respect to the identification of a direct association; it does not, however, affect the ability to detect indirect association.As such, the results of MDR analysis in datasets with LD should be carefully examined to consider the underlying LD structure of the dataset.

View Article: PubMed Central - HTML - PubMed

Affiliation: Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, TN 37232, USA. ritchie@chgr.mc.vanderbilt.edu.

ABSTRACT

Background: In the analysis of large-scale genomic datasets, an important consideration is the power of analytical methods to identify accurate predictive models of disease. When trying to assess sensitivity from such analytical methods, a confounding factor up to this point has been the presence of linkage disequilibrium (LD). In this study, we examined the effect of LD on the sensitivity of the Multifactor Dimensionality Reduction (MDR) software package.

Results: Four relative amounts of LD were simulated in multiple one- and two-locus scenarios for which the position of the functional SNP(s) within LD blocks varied. Simulated data was analyzed with MDR to determine the sensitivity of the method in different contexts, where the sensitivity of the method was gauged as the number of times out of 100 that the method identifies the correct one- or two-locus model as the best overall model. As the amount of LD increases, the sensitivity of MDR to detect the correct functional SNP drops but the sensitivity to detect the disease signal and find an indirect association increases.

Conclusions: Higher levels of LD begin to confound the MDR algorithm and lead to a drop in sensitivity with respect to the identification of a direct association; it does not, however, affect the ability to detect indirect association. Careful examination of the solution models generated by MDR reveals that MDR can identify loci in the correct LD block; though it is not always the functional SNP. As such, the results of MDR analysis in datasets with LD should be carefully examined to consider the underlying LD structure of the dataset.

No MeSH data available.


Related in: MedlinePlus

Statistics used to measure LD. Equations to calculate statistics commonly used to measure the degree of LD in genetic data.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3108918&req=5

Figure 1: Statistics used to measure LD. Equations to calculate statistics commonly used to measure the degree of LD in genetic data.

Mentions: While there are multiple statistics which can be used to measure the degree of LD between alleles at two genetic variants, the two which have become most popular in genetic epidemiology are r2 and D'. Both statistics are based around D, the coefficient of disequilibrium. The value of D is the product of the frequencies of the alleles of interest at loci A and B subtracted from the frequency of chromosomes or gametes carrying both alleles (Figure 1) [1]. D' is the value of D divided by the absolute value of the maximum possible value D could take on given the allele frequencies and is thus a normalized statistic comparable between allelic pairs. The r2 statistic is a correlation coefficient between the two alleles and will only be large if both are similar in frequency. For D, D' and r2, a value of zero is expected under the hypothesis of no allelic association. While D can be negative or positive, both D' and r2 range between zero and one. A value of one for D' indicates perfect disequilbrium as it relates to the absence of at least one of the expected haplotypes which would be possible given the alleles at the two loci. If r2 takes the value of one, it means that the alleles at the two loci are perfectly correlated and are thus also in perfect disequilibrium. It is possible to have r2 < 1 given D' = 1 but not the reciprocal. While the concept of linkage disequilibrium and the statistics used to describe it are specific to genetics, the phenomenon can more generally be considered as the presence of correlation between variables when thought of in regards to data mining and analysis methods.


The effects of linkage disequilibrium in large scale SNP datasets for MDR.

Grady BJ, Torstenson ES, Ritchie MD - BioData Min (2011)

Statistics used to measure LD. Equations to calculate statistics commonly used to measure the degree of LD in genetic data.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3108918&req=5

Figure 1: Statistics used to measure LD. Equations to calculate statistics commonly used to measure the degree of LD in genetic data.
Mentions: While there are multiple statistics which can be used to measure the degree of LD between alleles at two genetic variants, the two which have become most popular in genetic epidemiology are r2 and D'. Both statistics are based around D, the coefficient of disequilibrium. The value of D is the product of the frequencies of the alleles of interest at loci A and B subtracted from the frequency of chromosomes or gametes carrying both alleles (Figure 1) [1]. D' is the value of D divided by the absolute value of the maximum possible value D could take on given the allele frequencies and is thus a normalized statistic comparable between allelic pairs. The r2 statistic is a correlation coefficient between the two alleles and will only be large if both are similar in frequency. For D, D' and r2, a value of zero is expected under the hypothesis of no allelic association. While D can be negative or positive, both D' and r2 range between zero and one. A value of one for D' indicates perfect disequilbrium as it relates to the absence of at least one of the expected haplotypes which would be possible given the alleles at the two loci. If r2 takes the value of one, it means that the alleles at the two loci are perfectly correlated and are thus also in perfect disequilibrium. It is possible to have r2 < 1 given D' = 1 but not the reciprocal. While the concept of linkage disequilibrium and the statistics used to describe it are specific to genetics, the phenomenon can more generally be considered as the presence of correlation between variables when thought of in regards to data mining and analysis methods.

Bottom Line: In this study, we examined the effect of LD on the sensitivity of the Multifactor Dimensionality Reduction (MDR) software package.Higher levels of LD begin to confound the MDR algorithm and lead to a drop in sensitivity with respect to the identification of a direct association; it does not, however, affect the ability to detect indirect association.As such, the results of MDR analysis in datasets with LD should be carefully examined to consider the underlying LD structure of the dataset.

View Article: PubMed Central - HTML - PubMed

Affiliation: Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, TN 37232, USA. ritchie@chgr.mc.vanderbilt.edu.

ABSTRACT

Background: In the analysis of large-scale genomic datasets, an important consideration is the power of analytical methods to identify accurate predictive models of disease. When trying to assess sensitivity from such analytical methods, a confounding factor up to this point has been the presence of linkage disequilibrium (LD). In this study, we examined the effect of LD on the sensitivity of the Multifactor Dimensionality Reduction (MDR) software package.

Results: Four relative amounts of LD were simulated in multiple one- and two-locus scenarios for which the position of the functional SNP(s) within LD blocks varied. Simulated data was analyzed with MDR to determine the sensitivity of the method in different contexts, where the sensitivity of the method was gauged as the number of times out of 100 that the method identifies the correct one- or two-locus model as the best overall model. As the amount of LD increases, the sensitivity of MDR to detect the correct functional SNP drops but the sensitivity to detect the disease signal and find an indirect association increases.

Conclusions: Higher levels of LD begin to confound the MDR algorithm and lead to a drop in sensitivity with respect to the identification of a direct association; it does not, however, affect the ability to detect indirect association. Careful examination of the solution models generated by MDR reveals that MDR can identify loci in the correct LD block; though it is not always the functional SNP. As such, the results of MDR analysis in datasets with LD should be carefully examined to consider the underlying LD structure of the dataset.

No MeSH data available.


Related in: MedlinePlus