Limits...
Expression Quantitative Trait Loci Information Improves Predictive Modeling of Disease Relevance of Non-Coding Genetic Variation.

Croteau-Chonka DC, Rogers AJ, Raj T, McGeachie MJ, Qiu W, Ziniti JP, Stubbs BJ, Liang L, Martinez FD, Strunk RC, Lemanske RF, Liu AH, Stranger BE, Carey VJ, Raby BA - PLoS ONE (2015)

Bottom Line: Disease-associated loci identified through genome-wide association studies (GWAS) frequently localize to non-coding sequence.At a false discovery rate < 5%, we mapped eQTLs for 6,535 genes; these were enriched for disease-associated genes (P < 10(-04)), particularly those related to immune diseases and metabolic traits.This complete prediction model including eQTL association information ultimately allowed for better discrimination of SNPs with higher probabilities of GWAS membership (6.3-10.0%, compared to 3.5% for a random SNP) than the other two models excluding eQTL information.

View Article: PubMed Central - PubMed

Affiliation: Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America.

ABSTRACT
Disease-associated loci identified through genome-wide association studies (GWAS) frequently localize to non-coding sequence. We and others have demonstrated strong enrichment of such single nucleotide polymorphisms (SNPs) for expression quantitative trait loci (eQTLs), supporting an important role for regulatory genetic variation in complex disease pathogenesis. Herein we describe our initial efforts to develop a predictive model of disease-associated variants leveraging eQTL information. We first catalogued cis-acting eQTLs (SNPs within 100 kb of target gene transcripts) by meta-analyzing four studies of three blood-derived tissues (n = 586). At a false discovery rate < 5%, we mapped eQTLs for 6,535 genes; these were enriched for disease-associated genes (P < 10(-04)), particularly those related to immune diseases and metabolic traits. Based on eQTL information and other variant annotations (distance from target gene transcript, minor allele frequency, and chromatin state), we created multivariate logistic regression models to predict SNP membership in reported GWAS. The complete model revealed independent contributions of specific annotations as strong predictors, including evidence for an eQTL (odds ratio (OR) = 1.2-2.0, P < 10(-11)) and the chromatin states of active promoters, different classes of strong or weak enhancers, or transcriptionally active regions (OR = 1.5-2.3, P < 10(-11)). This complete prediction model including eQTL association information ultimately allowed for better discrimination of SNPs with higher probabilities of GWAS membership (6.3-10.0%, compared to 3.5% for a random SNP) than the other two models excluding eQTL information. This eQTL-based prediction model of disease relevance can help systematically prioritize non-coding GWAS SNPs for further functional characterization.

No MeSH data available.


Related in: MedlinePlus

Forest plot of component effects of complete GWAS predictive model based on training set of SNPs.Odds ratios (black squares) from the complete multivariate model (“chromstate+eqtl [M3]”) for features predicting the membership of a SNP in the NHGRI GWAS Catalog are shown here with standard errors (gray lines). Smaller models are shown for comparison in S2 Fig. Four classes of SNP annotation are represented in the model, each with multiple levels: distance from gene, MAF, chromatin state in GM12878 LCLs (12), and evidence of eQTL association based on meta-analysis FDR. The base levels for each annotation are “0 kb (within gene)” [Distance from Gene], “>10%” [MAF], “Heterochromatin (13)” [ChromHMM], and “>50%” [FDR].
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4608673&req=5

pone.0140758.g004: Forest plot of component effects of complete GWAS predictive model based on training set of SNPs.Odds ratios (black squares) from the complete multivariate model (“chromstate+eqtl [M3]”) for features predicting the membership of a SNP in the NHGRI GWAS Catalog are shown here with standard errors (gray lines). Smaller models are shown for comparison in S2 Fig. Four classes of SNP annotation are represented in the model, each with multiple levels: distance from gene, MAF, chromatin state in GM12878 LCLs (12), and evidence of eQTL association based on meta-analysis FDR. The base levels for each annotation are “0 kb (within gene)” [Distance from Gene], “>10%” [MAF], “Heterochromatin (13)” [ChromHMM], and “>50%” [FDR].

Mentions: We next developed multivariate logistic predictive models of the likelihood of a SNP being a “GWAS hit”, namely being reported in the GWAS Catalog (or having a close proxy at r2 > 0.8). The complete model (“chromstate+eqtl [M3]”) considered multiple SNP features, including physical distance from target transcript, MAF, putative chromatin state in LCLs [12], and strength of eQTL association as estimated from our meta-analysis (Fig 4). For comparison, we also examined two smaller models: one that considered physical distance from target transcript, MAF, and the variants’ position relative to transcript (“structure [M1]”) and one that considered physical distance from target transcript, MAF, and the chromatin state annotations (“chromstate [M2]”) (S2 Fig). MAF was included in all models to adjust for the overrepresentation of common variants in the GWAS Catalog, which is a reflection of the inherent power-related bias of GWAS to detect associations with common variants.


Expression Quantitative Trait Loci Information Improves Predictive Modeling of Disease Relevance of Non-Coding Genetic Variation.

Croteau-Chonka DC, Rogers AJ, Raj T, McGeachie MJ, Qiu W, Ziniti JP, Stubbs BJ, Liang L, Martinez FD, Strunk RC, Lemanske RF, Liu AH, Stranger BE, Carey VJ, Raby BA - PLoS ONE (2015)

Forest plot of component effects of complete GWAS predictive model based on training set of SNPs.Odds ratios (black squares) from the complete multivariate model (“chromstate+eqtl [M3]”) for features predicting the membership of a SNP in the NHGRI GWAS Catalog are shown here with standard errors (gray lines). Smaller models are shown for comparison in S2 Fig. Four classes of SNP annotation are represented in the model, each with multiple levels: distance from gene, MAF, chromatin state in GM12878 LCLs (12), and evidence of eQTL association based on meta-analysis FDR. The base levels for each annotation are “0 kb (within gene)” [Distance from Gene], “>10%” [MAF], “Heterochromatin (13)” [ChromHMM], and “>50%” [FDR].
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4608673&req=5

pone.0140758.g004: Forest plot of component effects of complete GWAS predictive model based on training set of SNPs.Odds ratios (black squares) from the complete multivariate model (“chromstate+eqtl [M3]”) for features predicting the membership of a SNP in the NHGRI GWAS Catalog are shown here with standard errors (gray lines). Smaller models are shown for comparison in S2 Fig. Four classes of SNP annotation are represented in the model, each with multiple levels: distance from gene, MAF, chromatin state in GM12878 LCLs (12), and evidence of eQTL association based on meta-analysis FDR. The base levels for each annotation are “0 kb (within gene)” [Distance from Gene], “>10%” [MAF], “Heterochromatin (13)” [ChromHMM], and “>50%” [FDR].
Mentions: We next developed multivariate logistic predictive models of the likelihood of a SNP being a “GWAS hit”, namely being reported in the GWAS Catalog (or having a close proxy at r2 > 0.8). The complete model (“chromstate+eqtl [M3]”) considered multiple SNP features, including physical distance from target transcript, MAF, putative chromatin state in LCLs [12], and strength of eQTL association as estimated from our meta-analysis (Fig 4). For comparison, we also examined two smaller models: one that considered physical distance from target transcript, MAF, and the variants’ position relative to transcript (“structure [M1]”) and one that considered physical distance from target transcript, MAF, and the chromatin state annotations (“chromstate [M2]”) (S2 Fig). MAF was included in all models to adjust for the overrepresentation of common variants in the GWAS Catalog, which is a reflection of the inherent power-related bias of GWAS to detect associations with common variants.

Bottom Line: Disease-associated loci identified through genome-wide association studies (GWAS) frequently localize to non-coding sequence.At a false discovery rate < 5%, we mapped eQTLs for 6,535 genes; these were enriched for disease-associated genes (P < 10(-04)), particularly those related to immune diseases and metabolic traits.This complete prediction model including eQTL association information ultimately allowed for better discrimination of SNPs with higher probabilities of GWAS membership (6.3-10.0%, compared to 3.5% for a random SNP) than the other two models excluding eQTL information.

View Article: PubMed Central - PubMed

Affiliation: Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America.

ABSTRACT
Disease-associated loci identified through genome-wide association studies (GWAS) frequently localize to non-coding sequence. We and others have demonstrated strong enrichment of such single nucleotide polymorphisms (SNPs) for expression quantitative trait loci (eQTLs), supporting an important role for regulatory genetic variation in complex disease pathogenesis. Herein we describe our initial efforts to develop a predictive model of disease-associated variants leveraging eQTL information. We first catalogued cis-acting eQTLs (SNPs within 100 kb of target gene transcripts) by meta-analyzing four studies of three blood-derived tissues (n = 586). At a false discovery rate < 5%, we mapped eQTLs for 6,535 genes; these were enriched for disease-associated genes (P < 10(-04)), particularly those related to immune diseases and metabolic traits. Based on eQTL information and other variant annotations (distance from target gene transcript, minor allele frequency, and chromatin state), we created multivariate logistic regression models to predict SNP membership in reported GWAS. The complete model revealed independent contributions of specific annotations as strong predictors, including evidence for an eQTL (odds ratio (OR) = 1.2-2.0, P < 10(-11)) and the chromatin states of active promoters, different classes of strong or weak enhancers, or transcriptionally active regions (OR = 1.5-2.3, P < 10(-11)). This complete prediction model including eQTL association information ultimately allowed for better discrimination of SNPs with higher probabilities of GWAS membership (6.3-10.0%, compared to 3.5% for a random SNP) than the other two models excluding eQTL information. This eQTL-based prediction model of disease relevance can help systematically prioritize non-coding GWAS SNPs for further functional characterization.

No MeSH data available.


Related in: MedlinePlus