Limits...
Further improvements to linear mixed models for genome-wide association studies.

Widmer C, Lippert C, Weissbrod O, Fusi N, Kadie C, Davidson R, Listgarten J, Heckerman D - Sci Rep (2014)

Bottom Line: Traditionally, all available SNPs are used to estimate the GSM.Specifically, when only population structure is present, a GSM constructed from SNPs that well predict the phenotype in combination with principal components as covariates controls type I error and yields more power than the traditional LMM.In any setting, with or without population structure or family relatedness, a GSM consisting of a mixture of two component GSMs, one constructed from all SNPs and another constructed from SNPs that well predict the phenotype again controls type I error and yields more power than the traditional LMM.

View Article: PubMed Central - PubMed

Affiliation: eScience Group, Microsoft Research, 1100 Glendon Avenue, Suite PH1, Los Angeles, CA, 90024, United States.

ABSTRACT
We examine improvements to the linear mixed model (LMM) that better correct for population structure and family relatedness in genome-wide association studies (GWAS). LMMs rely on the estimation of a genetic similarity matrix (GSM), which encodes the pairwise similarity between every two individuals in a cohort. These similarities are estimated from single nucleotide polymorphisms (SNPs) or other genetic variants. Traditionally, all available SNPs are used to estimate the GSM. In empirical studies across a wide range of synthetic and real data, we find that modifications to this approach improve GWAS performance as measured by type I error control and power. Specifically, when only population structure is present, a GSM constructed from SNPs that well predict the phenotype in combination with principal components as covariates controls type I error and yields more power than the traditional LMM. In any setting, with or without population structure or family relatedness, a GSM consisting of a mixture of two component GSMs, one constructed from all SNPs and another constructed from SNPs that well predict the phenotype again controls type I error and yields more power than the traditional LMM. Software implementing these improvements and the experimental comparisons are available at http://microsoft.com/science.

Show MeSH
Empirical type I error rate and power for phenotypes synthetically generated fromSNPs from the Mouse data with 10 causal SNPs.GSMs were estimated from SNPs sampled uniformly across the genome (every kthSNP). Each point represents average type I error rate or power across 4,000 syntheticphenotypes.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4230738&req=5

f7: Empirical type I error rate and power for phenotypes synthetically generated fromSNPs from the Mouse data with 10 causal SNPs.GSMs were estimated from SNPs sampled uniformly across the genome (every kthSNP). Each point represents average type I error rate or power across 4,000 syntheticphenotypes.

Mentions: Finally, LD among the SNPs in this data allowed us to investigate the usefulness ofreplacing a GSM estimated from all SNPs with one estimated after LD sampling, as firstsuggested in ref. 4. We did so for the Mouse SNPs, where wehad found a GSM based on all SNPs to be most needed for control of type I error. A sampleof only one fourth of the available 10,000 SNPs yielded good control of type I error(Figure 7), suggesting that, at least for this SNP data, LDsampling can be an effective approach to improving the run time of GWAS.


Further improvements to linear mixed models for genome-wide association studies.

Widmer C, Lippert C, Weissbrod O, Fusi N, Kadie C, Davidson R, Listgarten J, Heckerman D - Sci Rep (2014)

Empirical type I error rate and power for phenotypes synthetically generated fromSNPs from the Mouse data with 10 causal SNPs.GSMs were estimated from SNPs sampled uniformly across the genome (every kthSNP). Each point represents average type I error rate or power across 4,000 syntheticphenotypes.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4230738&req=5

f7: Empirical type I error rate and power for phenotypes synthetically generated fromSNPs from the Mouse data with 10 causal SNPs.GSMs were estimated from SNPs sampled uniformly across the genome (every kthSNP). Each point represents average type I error rate or power across 4,000 syntheticphenotypes.
Mentions: Finally, LD among the SNPs in this data allowed us to investigate the usefulness ofreplacing a GSM estimated from all SNPs with one estimated after LD sampling, as firstsuggested in ref. 4. We did so for the Mouse SNPs, where wehad found a GSM based on all SNPs to be most needed for control of type I error. A sampleof only one fourth of the available 10,000 SNPs yielded good control of type I error(Figure 7), suggesting that, at least for this SNP data, LDsampling can be an effective approach to improving the run time of GWAS.

Bottom Line: Traditionally, all available SNPs are used to estimate the GSM.Specifically, when only population structure is present, a GSM constructed from SNPs that well predict the phenotype in combination with principal components as covariates controls type I error and yields more power than the traditional LMM.In any setting, with or without population structure or family relatedness, a GSM consisting of a mixture of two component GSMs, one constructed from all SNPs and another constructed from SNPs that well predict the phenotype again controls type I error and yields more power than the traditional LMM.

View Article: PubMed Central - PubMed

Affiliation: eScience Group, Microsoft Research, 1100 Glendon Avenue, Suite PH1, Los Angeles, CA, 90024, United States.

ABSTRACT
We examine improvements to the linear mixed model (LMM) that better correct for population structure and family relatedness in genome-wide association studies (GWAS). LMMs rely on the estimation of a genetic similarity matrix (GSM), which encodes the pairwise similarity between every two individuals in a cohort. These similarities are estimated from single nucleotide polymorphisms (SNPs) or other genetic variants. Traditionally, all available SNPs are used to estimate the GSM. In empirical studies across a wide range of synthetic and real data, we find that modifications to this approach improve GWAS performance as measured by type I error control and power. Specifically, when only population structure is present, a GSM constructed from SNPs that well predict the phenotype in combination with principal components as covariates controls type I error and yields more power than the traditional LMM. In any setting, with or without population structure or family relatedness, a GSM consisting of a mixture of two component GSMs, one constructed from all SNPs and another constructed from SNPs that well predict the phenotype again controls type I error and yields more power than the traditional LMM. Software implementing these improvements and the experimental comparisons are available at http://microsoft.com/science.

Show MeSH