Limits...
Further improvements to linear mixed models for genome-wide association studies.

Widmer C, Lippert C, Weissbrod O, Fusi N, Kadie C, Davidson R, Listgarten J, Heckerman D - Sci Rep (2014)

Bottom Line: Traditionally, all available SNPs are used to estimate the GSM.Specifically, when only population structure is present, a GSM constructed from SNPs that well predict the phenotype in combination with principal components as covariates controls type I error and yields more power than the traditional LMM.In any setting, with or without population structure or family relatedness, a GSM consisting of a mixture of two component GSMs, one constructed from all SNPs and another constructed from SNPs that well predict the phenotype again controls type I error and yields more power than the traditional LMM.

View Article: PubMed Central - PubMed

Affiliation: eScience Group, Microsoft Research, 1100 Glendon Avenue, Suite PH1, Los Angeles, CA, 90024, United States.

ABSTRACT
We examine improvements to the linear mixed model (LMM) that better correct for population structure and family relatedness in genome-wide association studies (GWAS). LMMs rely on the estimation of a genetic similarity matrix (GSM), which encodes the pairwise similarity between every two individuals in a cohort. These similarities are estimated from single nucleotide polymorphisms (SNPs) or other genetic variants. Traditionally, all available SNPs are used to estimate the GSM. In empirical studies across a wide range of synthetic and real data, we find that modifications to this approach improve GWAS performance as measured by type I error control and power. Specifically, when only population structure is present, a GSM constructed from SNPs that well predict the phenotype in combination with principal components as covariates controls type I error and yields more power than the traditional LMM. In any setting, with or without population structure or family relatedness, a GSM consisting of a mixture of two component GSMs, one constructed from all SNPs and another constructed from SNPs that well predict the phenotype again controls type I error and yields more power than the traditional LMM. Software implementing these improvements and the experimental comparisons are available at http://microsoft.com/science.

Show MeSH

Related in: MedlinePlus

The GSM for three real SNP data sets.Each point in the matrix corresponds to the similarity between a pair of individuals.Lighter colors correspond to greater similarity. The ordering was obtained by ahierarchical clustering, as indicated by the dendrograms on the axes, where differentcolors reflect substantially different clusters.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4230738&req=5

f5: The GSM for three real SNP data sets.Each point in the matrix corresponds to the similarity between a pair of individuals.Lighter colors correspond to greater similarity. The ordering was obtained by ahierarchical clustering, as indicated by the dendrograms on the axes, where differentcolors reflect substantially different clusters.

Mentions: We used real SNPs from three cohorts, two from human studies—the Northern Finnish BirthCohort from 1966 (Finnish) and CIDR Visceral Adiposity Study (VAS)—and one from a mousecross (Mouse) (see Methods). These data contain various degrees of population structureand family relatedness. From a hierarchical clustering performed on each of these threedata sets (Figure 5), we see that Finnish contains little populationstructure or family relatedness, VAS contains mostly population structure as illustratedby the broad bands of similarity, and the mouse data contains both forms of confoundingstructure as illustrated by the combination of broad and narrow bands. We generated thephenotype in essentially the same manner as for purely synthetic data sets, always usingh2 = 0.5.


Further improvements to linear mixed models for genome-wide association studies.

Widmer C, Lippert C, Weissbrod O, Fusi N, Kadie C, Davidson R, Listgarten J, Heckerman D - Sci Rep (2014)

The GSM for three real SNP data sets.Each point in the matrix corresponds to the similarity between a pair of individuals.Lighter colors correspond to greater similarity. The ordering was obtained by ahierarchical clustering, as indicated by the dendrograms on the axes, where differentcolors reflect substantially different clusters.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4230738&req=5

f5: The GSM for three real SNP data sets.Each point in the matrix corresponds to the similarity between a pair of individuals.Lighter colors correspond to greater similarity. The ordering was obtained by ahierarchical clustering, as indicated by the dendrograms on the axes, where differentcolors reflect substantially different clusters.
Mentions: We used real SNPs from three cohorts, two from human studies—the Northern Finnish BirthCohort from 1966 (Finnish) and CIDR Visceral Adiposity Study (VAS)—and one from a mousecross (Mouse) (see Methods). These data contain various degrees of population structureand family relatedness. From a hierarchical clustering performed on each of these threedata sets (Figure 5), we see that Finnish contains little populationstructure or family relatedness, VAS contains mostly population structure as illustratedby the broad bands of similarity, and the mouse data contains both forms of confoundingstructure as illustrated by the combination of broad and narrow bands. We generated thephenotype in essentially the same manner as for purely synthetic data sets, always usingh2 = 0.5.

Bottom Line: Traditionally, all available SNPs are used to estimate the GSM.Specifically, when only population structure is present, a GSM constructed from SNPs that well predict the phenotype in combination with principal components as covariates controls type I error and yields more power than the traditional LMM.In any setting, with or without population structure or family relatedness, a GSM consisting of a mixture of two component GSMs, one constructed from all SNPs and another constructed from SNPs that well predict the phenotype again controls type I error and yields more power than the traditional LMM.

View Article: PubMed Central - PubMed

Affiliation: eScience Group, Microsoft Research, 1100 Glendon Avenue, Suite PH1, Los Angeles, CA, 90024, United States.

ABSTRACT
We examine improvements to the linear mixed model (LMM) that better correct for population structure and family relatedness in genome-wide association studies (GWAS). LMMs rely on the estimation of a genetic similarity matrix (GSM), which encodes the pairwise similarity between every two individuals in a cohort. These similarities are estimated from single nucleotide polymorphisms (SNPs) or other genetic variants. Traditionally, all available SNPs are used to estimate the GSM. In empirical studies across a wide range of synthetic and real data, we find that modifications to this approach improve GWAS performance as measured by type I error control and power. Specifically, when only population structure is present, a GSM constructed from SNPs that well predict the phenotype in combination with principal components as covariates controls type I error and yields more power than the traditional LMM. In any setting, with or without population structure or family relatedness, a GSM consisting of a mixture of two component GSMs, one constructed from all SNPs and another constructed from SNPs that well predict the phenotype again controls type I error and yields more power than the traditional LMM. Software implementing these improvements and the experimental comparisons are available at http://microsoft.com/science.

Show MeSH
Related in: MedlinePlus