Limits...
Further improvements to linear mixed models for genome-wide association studies.

Widmer C, Lippert C, Weissbrod O, Fusi N, Kadie C, Davidson R, Listgarten J, Heckerman D - Sci Rep (2014)

Bottom Line: Traditionally, all available SNPs are used to estimate the GSM.Specifically, when only population structure is present, a GSM constructed from SNPs that well predict the phenotype in combination with principal components as covariates controls type I error and yields more power than the traditional LMM.In any setting, with or without population structure or family relatedness, a GSM consisting of a mixture of two component GSMs, one constructed from all SNPs and another constructed from SNPs that well predict the phenotype again controls type I error and yields more power than the traditional LMM.

View Article: PubMed Central - PubMed

Affiliation: eScience Group, Microsoft Research, 1100 Glendon Avenue, Suite PH1, Los Angeles, CA, 90024, United States.

ABSTRACT
We examine improvements to the linear mixed model (LMM) that better correct for population structure and family relatedness in genome-wide association studies (GWAS). LMMs rely on the estimation of a genetic similarity matrix (GSM), which encodes the pairwise similarity between every two individuals in a cohort. These similarities are estimated from single nucleotide polymorphisms (SNPs) or other genetic variants. Traditionally, all available SNPs are used to estimate the GSM. In empirical studies across a wide range of synthetic and real data, we find that modifications to this approach improve GWAS performance as measured by type I error control and power. Specifically, when only population structure is present, a GSM constructed from SNPs that well predict the phenotype in combination with principal components as covariates controls type I error and yields more power than the traditional LMM. In any setting, with or without population structure or family relatedness, a GSM consisting of a mixture of two component GSMs, one constructed from all SNPs and another constructed from SNPs that well predict the phenotype again controls type I error and yields more power than the traditional LMM. Software implementing these improvements and the experimental comparisons are available at http://microsoft.com/science.

Show MeSH
Empirical type I error rate and power with and without population structure (PS) andfamily relatedness (FR) with purely synthetic data.Type I error rate is plotted as a function of P value cutoff α. Each pointrepresents the average type I error rate or power across multiple data sets with varyingnumbers of causal SNPs and varying degrees of heritability, population structure, andfamily relatedness.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4230738&req=5

f4: Empirical type I error rate and power with and without population structure (PS) andfamily relatedness (FR) with purely synthetic data.Type I error rate is plotted as a function of P value cutoff α. Each pointrepresents the average type I error rate or power across multiple data sets with varyingnumbers of causal SNPs and varying degrees of heritability, population structure, andfamily relatedness.

Mentions: Under this setting, we evaluated Linreg + PCs, LMM(all), and LMM(select) + PCs. Linreg+ PCs and LMM(select) + PCs failed to control type I error, in contrast to the settingwith population structure, whereas LMM(all) controlled type I error (Figure 4 and Supplementary Figure 4). Again, we canunderstand these results in terms of the graphical-model structure for thedata-generation process, which is that of Figure 3a where now thehidden variable corresponds to family relatedness. In terms of this graph, inflationobserved for Linreg + PCs (also seen in Ref. 8) indicatesthat the use of PCs as fixed effects failed to block the paths through l fromnon-causal SNPs to y. Similarly, inflation observed for LMM(select) + PCsindicates that neither PCs as fixed effects nor selected SNPs blocked the paths throughl. Only LMM(all) blocked all paths from the non-causal SNPs to y, eitherby conditioning on all SNPs, having a GSM that fully captures family relatednessl, or both. Later in this section, we will see evidence supporting at least thesecond explanation.


Further improvements to linear mixed models for genome-wide association studies.

Widmer C, Lippert C, Weissbrod O, Fusi N, Kadie C, Davidson R, Listgarten J, Heckerman D - Sci Rep (2014)

Empirical type I error rate and power with and without population structure (PS) andfamily relatedness (FR) with purely synthetic data.Type I error rate is plotted as a function of P value cutoff α. Each pointrepresents the average type I error rate or power across multiple data sets with varyingnumbers of causal SNPs and varying degrees of heritability, population structure, andfamily relatedness.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4230738&req=5

f4: Empirical type I error rate and power with and without population structure (PS) andfamily relatedness (FR) with purely synthetic data.Type I error rate is plotted as a function of P value cutoff α. Each pointrepresents the average type I error rate or power across multiple data sets with varyingnumbers of causal SNPs and varying degrees of heritability, population structure, andfamily relatedness.
Mentions: Under this setting, we evaluated Linreg + PCs, LMM(all), and LMM(select) + PCs. Linreg+ PCs and LMM(select) + PCs failed to control type I error, in contrast to the settingwith population structure, whereas LMM(all) controlled type I error (Figure 4 and Supplementary Figure 4). Again, we canunderstand these results in terms of the graphical-model structure for thedata-generation process, which is that of Figure 3a where now thehidden variable corresponds to family relatedness. In terms of this graph, inflationobserved for Linreg + PCs (also seen in Ref. 8) indicatesthat the use of PCs as fixed effects failed to block the paths through l fromnon-causal SNPs to y. Similarly, inflation observed for LMM(select) + PCsindicates that neither PCs as fixed effects nor selected SNPs blocked the paths throughl. Only LMM(all) blocked all paths from the non-causal SNPs to y, eitherby conditioning on all SNPs, having a GSM that fully captures family relatednessl, or both. Later in this section, we will see evidence supporting at least thesecond explanation.

Bottom Line: Traditionally, all available SNPs are used to estimate the GSM.Specifically, when only population structure is present, a GSM constructed from SNPs that well predict the phenotype in combination with principal components as covariates controls type I error and yields more power than the traditional LMM.In any setting, with or without population structure or family relatedness, a GSM consisting of a mixture of two component GSMs, one constructed from all SNPs and another constructed from SNPs that well predict the phenotype again controls type I error and yields more power than the traditional LMM.

View Article: PubMed Central - PubMed

Affiliation: eScience Group, Microsoft Research, 1100 Glendon Avenue, Suite PH1, Los Angeles, CA, 90024, United States.

ABSTRACT
We examine improvements to the linear mixed model (LMM) that better correct for population structure and family relatedness in genome-wide association studies (GWAS). LMMs rely on the estimation of a genetic similarity matrix (GSM), which encodes the pairwise similarity between every two individuals in a cohort. These similarities are estimated from single nucleotide polymorphisms (SNPs) or other genetic variants. Traditionally, all available SNPs are used to estimate the GSM. In empirical studies across a wide range of synthetic and real data, we find that modifications to this approach improve GWAS performance as measured by type I error control and power. Specifically, when only population structure is present, a GSM constructed from SNPs that well predict the phenotype in combination with principal components as covariates controls type I error and yields more power than the traditional LMM. In any setting, with or without population structure or family relatedness, a GSM consisting of a mixture of two component GSMs, one constructed from all SNPs and another constructed from SNPs that well predict the phenotype again controls type I error and yields more power than the traditional LMM. Software implementing these improvements and the experimental comparisons are available at http://microsoft.com/science.

Show MeSH