Limits...
Further improvements to linear mixed models for genome-wide association studies.

Widmer C, Lippert C, Weissbrod O, Fusi N, Kadie C, Davidson R, Listgarten J, Heckerman D - Sci Rep (2014)

Bottom Line: Traditionally, all available SNPs are used to estimate the GSM.Specifically, when only population structure is present, a GSM constructed from SNPs that well predict the phenotype in combination with principal components as covariates controls type I error and yields more power than the traditional LMM.In any setting, with or without population structure or family relatedness, a GSM consisting of a mixture of two component GSMs, one constructed from all SNPs and another constructed from SNPs that well predict the phenotype again controls type I error and yields more power than the traditional LMM.

View Article: PubMed Central - PubMed

Affiliation: eScience Group, Microsoft Research, 1100 Glendon Avenue, Suite PH1, Los Angeles, CA, 90024, United States.

ABSTRACT
We examine improvements to the linear mixed model (LMM) that better correct for population structure and family relatedness in genome-wide association studies (GWAS). LMMs rely on the estimation of a genetic similarity matrix (GSM), which encodes the pairwise similarity between every two individuals in a cohort. These similarities are estimated from single nucleotide polymorphisms (SNPs) or other genetic variants. Traditionally, all available SNPs are used to estimate the GSM. In empirical studies across a wide range of synthetic and real data, we find that modifications to this approach improve GWAS performance as measured by type I error control and power. Specifically, when only population structure is present, a GSM constructed from SNPs that well predict the phenotype in combination with principal components as covariates controls type I error and yields more power than the traditional LMM. In any setting, with or without population structure or family relatedness, a GSM consisting of a mixture of two component GSMs, one constructed from all SNPs and another constructed from SNPs that well predict the phenotype again controls type I error and yields more power than the traditional LMM. Software implementing these improvements and the experimental comparisons are available at http://microsoft.com/science.

Show MeSH
Box plots showing number of SNPs selected and mixing weight as a function of thenumber of causal SNPs with purely synthetic data.The first column shows log10 of the number of SNPs selected by LMM(select).The highest point corresponds to the selection of all SNPs. The second and third columnsshow the number of selected SNPs and mixing weights for LMM(all + select). A mixingweight of 1 corresponds to using a GSM based only on SNP selection. A mixing weight of 0corresponds to using a GSM based only on all SNPs.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4230738&req=5

f2: Box plots showing number of SNPs selected and mixing weight as a function of thenumber of causal SNPs with purely synthetic data.The first column shows log10 of the number of SNPs selected by LMM(select).The highest point corresponds to the selection of all SNPs. The second and third columnsshow the number of selected SNPs and mixing weights for LMM(all + select). A mixingweight of 1 corresponds to using a GSM based only on SNP selection. A mixing weight of 0corresponds to using a GSM based only on all SNPs.

Mentions: One interesting finding was that SNP selection would select all SNPs in many data setswhen only a relatively small number of the SNPs in the generating data were causal(Figure 2). One explanation is that, as the number of causalSNPs increases for a fixed narrow-sense heritability, the signal in each SNP decreases.Therefore, even for a relatively small number of causal SNPs (e.g., less than1000), the SNP selection algorithms may not be able to detect the signal at theindividual-SNP level, thus finding all SNPs to be optimal. (See ref. 17 for a theoretical discussion.) Indeed, when we used 1000 causal SNPs andincreased narrow-sense heritability beyond 0.4, less than all SNPs (in fact, less than1000) were selected.


Further improvements to linear mixed models for genome-wide association studies.

Widmer C, Lippert C, Weissbrod O, Fusi N, Kadie C, Davidson R, Listgarten J, Heckerman D - Sci Rep (2014)

Box plots showing number of SNPs selected and mixing weight as a function of thenumber of causal SNPs with purely synthetic data.The first column shows log10 of the number of SNPs selected by LMM(select).The highest point corresponds to the selection of all SNPs. The second and third columnsshow the number of selected SNPs and mixing weights for LMM(all + select). A mixingweight of 1 corresponds to using a GSM based only on SNP selection. A mixing weight of 0corresponds to using a GSM based only on all SNPs.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4230738&req=5

f2: Box plots showing number of SNPs selected and mixing weight as a function of thenumber of causal SNPs with purely synthetic data.The first column shows log10 of the number of SNPs selected by LMM(select).The highest point corresponds to the selection of all SNPs. The second and third columnsshow the number of selected SNPs and mixing weights for LMM(all + select). A mixingweight of 1 corresponds to using a GSM based only on SNP selection. A mixing weight of 0corresponds to using a GSM based only on all SNPs.
Mentions: One interesting finding was that SNP selection would select all SNPs in many data setswhen only a relatively small number of the SNPs in the generating data were causal(Figure 2). One explanation is that, as the number of causalSNPs increases for a fixed narrow-sense heritability, the signal in each SNP decreases.Therefore, even for a relatively small number of causal SNPs (e.g., less than1000), the SNP selection algorithms may not be able to detect the signal at theindividual-SNP level, thus finding all SNPs to be optimal. (See ref. 17 for a theoretical discussion.) Indeed, when we used 1000 causal SNPs andincreased narrow-sense heritability beyond 0.4, less than all SNPs (in fact, less than1000) were selected.

Bottom Line: Traditionally, all available SNPs are used to estimate the GSM.Specifically, when only population structure is present, a GSM constructed from SNPs that well predict the phenotype in combination with principal components as covariates controls type I error and yields more power than the traditional LMM.In any setting, with or without population structure or family relatedness, a GSM consisting of a mixture of two component GSMs, one constructed from all SNPs and another constructed from SNPs that well predict the phenotype again controls type I error and yields more power than the traditional LMM.

View Article: PubMed Central - PubMed

Affiliation: eScience Group, Microsoft Research, 1100 Glendon Avenue, Suite PH1, Los Angeles, CA, 90024, United States.

ABSTRACT
We examine improvements to the linear mixed model (LMM) that better correct for population structure and family relatedness in genome-wide association studies (GWAS). LMMs rely on the estimation of a genetic similarity matrix (GSM), which encodes the pairwise similarity between every two individuals in a cohort. These similarities are estimated from single nucleotide polymorphisms (SNPs) or other genetic variants. Traditionally, all available SNPs are used to estimate the GSM. In empirical studies across a wide range of synthetic and real data, we find that modifications to this approach improve GWAS performance as measured by type I error control and power. Specifically, when only population structure is present, a GSM constructed from SNPs that well predict the phenotype in combination with principal components as covariates controls type I error and yields more power than the traditional LMM. In any setting, with or without population structure or family relatedness, a GSM consisting of a mixture of two component GSMs, one constructed from all SNPs and another constructed from SNPs that well predict the phenotype again controls type I error and yields more power than the traditional LMM. Software implementing these improvements and the experimental comparisons are available at http://microsoft.com/science.

Show MeSH