Limits...
Further improvements to linear mixed models for genome-wide association studies.

Widmer C, Lippert C, Weissbrod O, Fusi N, Kadie C, Davidson R, Listgarten J, Heckerman D - Sci Rep (2014)

Bottom Line: Traditionally, all available SNPs are used to estimate the GSM.Specifically, when only population structure is present, a GSM constructed from SNPs that well predict the phenotype in combination with principal components as covariates controls type I error and yields more power than the traditional LMM.In any setting, with or without population structure or family relatedness, a GSM consisting of a mixture of two component GSMs, one constructed from all SNPs and another constructed from SNPs that well predict the phenotype again controls type I error and yields more power than the traditional LMM.

View Article: PubMed Central - PubMed

Affiliation: eScience Group, Microsoft Research, 1100 Glendon Avenue, Suite PH1, Los Angeles, CA, 90024, United States.

ABSTRACT
We examine improvements to the linear mixed model (LMM) that better correct for population structure and family relatedness in genome-wide association studies (GWAS). LMMs rely on the estimation of a genetic similarity matrix (GSM), which encodes the pairwise similarity between every two individuals in a cohort. These similarities are estimated from single nucleotide polymorphisms (SNPs) or other genetic variants. Traditionally, all available SNPs are used to estimate the GSM. In empirical studies across a wide range of synthetic and real data, we find that modifications to this approach improve GWAS performance as measured by type I error control and power. Specifically, when only population structure is present, a GSM constructed from SNPs that well predict the phenotype in combination with principal components as covariates controls type I error and yields more power than the traditional LMM. In any setting, with or without population structure or family relatedness, a GSM consisting of a mixture of two component GSMs, one constructed from all SNPs and another constructed from SNPs that well predict the phenotype again controls type I error and yields more power than the traditional LMM. Software implementing these improvements and the experimental comparisons are available at http://microsoft.com/science.

Show MeSH
Empirical type I error rate and power for three real SNP data sets and syntheticphenotypes with 10 causal SNPs.Each point represents the average type I error rate or power across multiple syntheticphenotypes (400 for Finnish and AVS, and 4,000 for Mouse). In the Finnish power plot,methods that include select have greater power than those that do not.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4230738&req=5

f6: Empirical type I error rate and power for three real SNP data sets and syntheticphenotypes with 10 causal SNPs.Each point represents the average type I error rate or power across multiple syntheticphenotypes (400 for Finnish and AVS, and 4,000 for Mouse). In the Finnish power plot,methods that include select have greater power than those that do not.

Mentions: We applied the models Linreg, Linreg + PCs, LMM(select), LMM(select) + PCs, LMM(all), andLMM(all + select) to each of these data sets, yielding results that were consistent withour findings on the purely synthetic data. In particular, for the Finnish SNPs, which hadlittle population structure or family relatedness, all models controlled type I error, andmodels using SNP selection had more power than models that did not (Figure6 and Supplementary Figure 7). For the VAS SNPs, whichcontained mostly population structure, all models except Linreg and LMM(select) controlledtype I error, and again models using SNP selection had more power than models that did not(Figure 6 and Supplementary Figure 8). For theMouse SNPs, which exhibited both forms of confounding structure, only LMM(all) and LMM(all+ select) controlled type I error, and LMM(all + select) had the most power, presumablybecause it was the only model that both used SNP selection and controlled type I error(Figure 6 and Supplementary Figure 9).


Further improvements to linear mixed models for genome-wide association studies.

Widmer C, Lippert C, Weissbrod O, Fusi N, Kadie C, Davidson R, Listgarten J, Heckerman D - Sci Rep (2014)

Empirical type I error rate and power for three real SNP data sets and syntheticphenotypes with 10 causal SNPs.Each point represents the average type I error rate or power across multiple syntheticphenotypes (400 for Finnish and AVS, and 4,000 for Mouse). In the Finnish power plot,methods that include select have greater power than those that do not.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4230738&req=5

f6: Empirical type I error rate and power for three real SNP data sets and syntheticphenotypes with 10 causal SNPs.Each point represents the average type I error rate or power across multiple syntheticphenotypes (400 for Finnish and AVS, and 4,000 for Mouse). In the Finnish power plot,methods that include select have greater power than those that do not.
Mentions: We applied the models Linreg, Linreg + PCs, LMM(select), LMM(select) + PCs, LMM(all), andLMM(all + select) to each of these data sets, yielding results that were consistent withour findings on the purely synthetic data. In particular, for the Finnish SNPs, which hadlittle population structure or family relatedness, all models controlled type I error, andmodels using SNP selection had more power than models that did not (Figure6 and Supplementary Figure 7). For the VAS SNPs, whichcontained mostly population structure, all models except Linreg and LMM(select) controlledtype I error, and again models using SNP selection had more power than models that did not(Figure 6 and Supplementary Figure 8). For theMouse SNPs, which exhibited both forms of confounding structure, only LMM(all) and LMM(all+ select) controlled type I error, and LMM(all + select) had the most power, presumablybecause it was the only model that both used SNP selection and controlled type I error(Figure 6 and Supplementary Figure 9).

Bottom Line: Traditionally, all available SNPs are used to estimate the GSM.Specifically, when only population structure is present, a GSM constructed from SNPs that well predict the phenotype in combination with principal components as covariates controls type I error and yields more power than the traditional LMM.In any setting, with or without population structure or family relatedness, a GSM consisting of a mixture of two component GSMs, one constructed from all SNPs and another constructed from SNPs that well predict the phenotype again controls type I error and yields more power than the traditional LMM.

View Article: PubMed Central - PubMed

Affiliation: eScience Group, Microsoft Research, 1100 Glendon Avenue, Suite PH1, Los Angeles, CA, 90024, United States.

ABSTRACT
We examine improvements to the linear mixed model (LMM) that better correct for population structure and family relatedness in genome-wide association studies (GWAS). LMMs rely on the estimation of a genetic similarity matrix (GSM), which encodes the pairwise similarity between every two individuals in a cohort. These similarities are estimated from single nucleotide polymorphisms (SNPs) or other genetic variants. Traditionally, all available SNPs are used to estimate the GSM. In empirical studies across a wide range of synthetic and real data, we find that modifications to this approach improve GWAS performance as measured by type I error control and power. Specifically, when only population structure is present, a GSM constructed from SNPs that well predict the phenotype in combination with principal components as covariates controls type I error and yields more power than the traditional LMM. In any setting, with or without population structure or family relatedness, a GSM consisting of a mixture of two component GSMs, one constructed from all SNPs and another constructed from SNPs that well predict the phenotype again controls type I error and yields more power than the traditional LMM. Software implementing these improvements and the experimental comparisons are available at http://microsoft.com/science.

Show MeSH