Limits...
The effects of linkage disequilibrium in large scale SNP datasets for MDR.

Grady BJ, Torstenson ES, Ritchie MD - BioData Min (2011)

Bottom Line: In this study, we examined the effect of LD on the sensitivity of the Multifactor Dimensionality Reduction (MDR) software package.Higher levels of LD begin to confound the MDR algorithm and lead to a drop in sensitivity with respect to the identification of a direct association; it does not, however, affect the ability to detect indirect association.As such, the results of MDR analysis in datasets with LD should be carefully examined to consider the underlying LD structure of the dataset.

View Article: PubMed Central - HTML - PubMed

Affiliation: Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, TN 37232, USA. ritchie@chgr.mc.vanderbilt.edu.

ABSTRACT

Background: In the analysis of large-scale genomic datasets, an important consideration is the power of analytical methods to identify accurate predictive models of disease. When trying to assess sensitivity from such analytical methods, a confounding factor up to this point has been the presence of linkage disequilibrium (LD). In this study, we examined the effect of LD on the sensitivity of the Multifactor Dimensionality Reduction (MDR) software package.

Results: Four relative amounts of LD were simulated in multiple one- and two-locus scenarios for which the position of the functional SNP(s) within LD blocks varied. Simulated data was analyzed with MDR to determine the sensitivity of the method in different contexts, where the sensitivity of the method was gauged as the number of times out of 100 that the method identifies the correct one- or two-locus model as the best overall model. As the amount of LD increases, the sensitivity of MDR to detect the correct functional SNP drops but the sensitivity to detect the disease signal and find an indirect association increases.

Conclusions: Higher levels of LD begin to confound the MDR algorithm and lead to a drop in sensitivity with respect to the identification of a direct association; it does not, however, affect the ability to detect indirect association. Careful examination of the solution models generated by MDR reveals that MDR can identify loci in the correct LD block; though it is not always the functional SNP. As such, the results of MDR analysis in datasets with LD should be carefully examined to consider the underlying LD structure of the dataset.

No MeSH data available.


Related in: MedlinePlus

Sensitivity of MDR for one-locus disease models. The sensitivity of MDR to detect the functional one-locus model exactly, indirectly, and in the absence of the functional SNP when analyzing data with 40% LD, 60% LD, 80% LD or 95% LD and attempting to identify a signal in different positions of a block of LD.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3108918&req=5

Figure 4: Sensitivity of MDR for one-locus disease models. The sensitivity of MDR to detect the functional one-locus model exactly, indirectly, and in the absence of the functional SNP when analyzing data with 40% LD, 60% LD, 80% LD or 95% LD and attempting to identify a signal in different positions of a block of LD.

Mentions: Two single-locus models each with three effect sizes were simulated with genomeSIMLA and analyzed with MDR. Due to the presence of only small differences in sensitivity between the three effect sizes tested, the mean sensitivity across the effect sizes is used for comparison. The scenario in which the functional SNP is removed from the dataset prior to analysis was also examined. Results of these analyses are shown in Table 1. In all amounts of LD, the signal sensitivity did not differ largely between model types or amounts of LD and was usually above 90 for all effect sizes. The exact sensitivity, however, varied highly depending on the amount of LD. When 40% or 60% of the SNPs in the dataset were in high LD with at least one other SNP, the exact sensitivity was nearly equivalent to the signal sensitivity and was greater than 90. When there was a greater amount of LD, as in the 80% LD and 95% LD cases, the exact sensitivity dropped far below the signal sensitivity. The difference in exact sensitivities between the one-locus models also became more pronounced. In 80% LD, the signal sensitivity was 91 over the effect sizes for the case with a SNP at the edge of a block of LD and 91.7 when the SNP was in the middle of the block. The exact sensitivities for these same models were 24.3 and 70 respectively. For one-locus models in 95% LD, the signal sensitivity was 88.7 with a SNP at the edge of a block and 89.3 for a SNP in the middle of a block while the exact sensitivities were 21 and 0 respectively. The trends present in the one-locus models are illustrated in Figure 4. In general, the inaccuracies that detracted from the sensitivity scores in MDR were due to two-locus models being chosen in place of a one-locus model which was not counted towards detection sensitivity even if the functional locus was in this model.


The effects of linkage disequilibrium in large scale SNP datasets for MDR.

Grady BJ, Torstenson ES, Ritchie MD - BioData Min (2011)

Sensitivity of MDR for one-locus disease models. The sensitivity of MDR to detect the functional one-locus model exactly, indirectly, and in the absence of the functional SNP when analyzing data with 40% LD, 60% LD, 80% LD or 95% LD and attempting to identify a signal in different positions of a block of LD.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3108918&req=5

Figure 4: Sensitivity of MDR for one-locus disease models. The sensitivity of MDR to detect the functional one-locus model exactly, indirectly, and in the absence of the functional SNP when analyzing data with 40% LD, 60% LD, 80% LD or 95% LD and attempting to identify a signal in different positions of a block of LD.
Mentions: Two single-locus models each with three effect sizes were simulated with genomeSIMLA and analyzed with MDR. Due to the presence of only small differences in sensitivity between the three effect sizes tested, the mean sensitivity across the effect sizes is used for comparison. The scenario in which the functional SNP is removed from the dataset prior to analysis was also examined. Results of these analyses are shown in Table 1. In all amounts of LD, the signal sensitivity did not differ largely between model types or amounts of LD and was usually above 90 for all effect sizes. The exact sensitivity, however, varied highly depending on the amount of LD. When 40% or 60% of the SNPs in the dataset were in high LD with at least one other SNP, the exact sensitivity was nearly equivalent to the signal sensitivity and was greater than 90. When there was a greater amount of LD, as in the 80% LD and 95% LD cases, the exact sensitivity dropped far below the signal sensitivity. The difference in exact sensitivities between the one-locus models also became more pronounced. In 80% LD, the signal sensitivity was 91 over the effect sizes for the case with a SNP at the edge of a block of LD and 91.7 when the SNP was in the middle of the block. The exact sensitivities for these same models were 24.3 and 70 respectively. For one-locus models in 95% LD, the signal sensitivity was 88.7 with a SNP at the edge of a block and 89.3 for a SNP in the middle of a block while the exact sensitivities were 21 and 0 respectively. The trends present in the one-locus models are illustrated in Figure 4. In general, the inaccuracies that detracted from the sensitivity scores in MDR were due to two-locus models being chosen in place of a one-locus model which was not counted towards detection sensitivity even if the functional locus was in this model.

Bottom Line: In this study, we examined the effect of LD on the sensitivity of the Multifactor Dimensionality Reduction (MDR) software package.Higher levels of LD begin to confound the MDR algorithm and lead to a drop in sensitivity with respect to the identification of a direct association; it does not, however, affect the ability to detect indirect association.As such, the results of MDR analysis in datasets with LD should be carefully examined to consider the underlying LD structure of the dataset.

View Article: PubMed Central - HTML - PubMed

Affiliation: Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, TN 37232, USA. ritchie@chgr.mc.vanderbilt.edu.

ABSTRACT

Background: In the analysis of large-scale genomic datasets, an important consideration is the power of analytical methods to identify accurate predictive models of disease. When trying to assess sensitivity from such analytical methods, a confounding factor up to this point has been the presence of linkage disequilibrium (LD). In this study, we examined the effect of LD on the sensitivity of the Multifactor Dimensionality Reduction (MDR) software package.

Results: Four relative amounts of LD were simulated in multiple one- and two-locus scenarios for which the position of the functional SNP(s) within LD blocks varied. Simulated data was analyzed with MDR to determine the sensitivity of the method in different contexts, where the sensitivity of the method was gauged as the number of times out of 100 that the method identifies the correct one- or two-locus model as the best overall model. As the amount of LD increases, the sensitivity of MDR to detect the correct functional SNP drops but the sensitivity to detect the disease signal and find an indirect association increases.

Conclusions: Higher levels of LD begin to confound the MDR algorithm and lead to a drop in sensitivity with respect to the identification of a direct association; it does not, however, affect the ability to detect indirect association. Careful examination of the solution models generated by MDR reveals that MDR can identify loci in the correct LD block; though it is not always the functional SNP. As such, the results of MDR analysis in datasets with LD should be carefully examined to consider the underlying LD structure of the dataset.

No MeSH data available.


Related in: MedlinePlus