Limits...
Expression Quantitative Trait Loci Information Improves Predictive Modeling of Disease Relevance of Non-Coding Genetic Variation.

Croteau-Chonka DC, Rogers AJ, Raj T, McGeachie MJ, Qiu W, Ziniti JP, Stubbs BJ, Liang L, Martinez FD, Strunk RC, Lemanske RF, Liu AH, Stranger BE, Carey VJ, Raby BA - PLoS ONE (2015)

Bottom Line: Disease-associated loci identified through genome-wide association studies (GWAS) frequently localize to non-coding sequence.At a false discovery rate < 5%, we mapped eQTLs for 6,535 genes; these were enriched for disease-associated genes (P < 10(-04)), particularly those related to immune diseases and metabolic traits.This complete prediction model including eQTL association information ultimately allowed for better discrimination of SNPs with higher probabilities of GWAS membership (6.3-10.0%, compared to 3.5% for a random SNP) than the other two models excluding eQTL information.

View Article: PubMed Central - PubMed

Affiliation: Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America.

ABSTRACT
Disease-associated loci identified through genome-wide association studies (GWAS) frequently localize to non-coding sequence. We and others have demonstrated strong enrichment of such single nucleotide polymorphisms (SNPs) for expression quantitative trait loci (eQTLs), supporting an important role for regulatory genetic variation in complex disease pathogenesis. Herein we describe our initial efforts to develop a predictive model of disease-associated variants leveraging eQTL information. We first catalogued cis-acting eQTLs (SNPs within 100 kb of target gene transcripts) by meta-analyzing four studies of three blood-derived tissues (n = 586). At a false discovery rate < 5%, we mapped eQTLs for 6,535 genes; these were enriched for disease-associated genes (P < 10(-04)), particularly those related to immune diseases and metabolic traits. Based on eQTL information and other variant annotations (distance from target gene transcript, minor allele frequency, and chromatin state), we created multivariate logistic regression models to predict SNP membership in reported GWAS. The complete model revealed independent contributions of specific annotations as strong predictors, including evidence for an eQTL (odds ratio (OR) = 1.2-2.0, P < 10(-11)) and the chromatin states of active promoters, different classes of strong or weak enhancers, or transcriptionally active regions (OR = 1.5-2.3, P < 10(-11)). This complete prediction model including eQTL association information ultimately allowed for better discrimination of SNPs with higher probabilities of GWAS membership (6.3-10.0%, compared to 3.5% for a random SNP) than the other two models excluding eQTL information. This eQTL-based prediction model of disease relevance can help systematically prioritize non-coding GWAS SNPs for further functional characterization.

No MeSH data available.


Related in: MedlinePlus

Multivariate logistic models predicting SNP membership in GWAS are well-calibrated.Top panel: Three models were developed for predicting the membership of a given SNP in the NHGRI GWAS Catalog, all incorporating at minimum the distance of the SNP from the transcript boundaries of its target gene and the minor allele frequency of the SNP. The "structure [M1]" model (white) also incorporates the NCBI gene structure classification of the gene (intron, coding, untranslated region, etc.) (S2 Fig); "chromstate [M2]" (gray) instead incorporates chromatin state (S2 Fig); "chromstate+eqtl [M3]" (black) incorporates both chromatin state and eQTL FDR class (Fig 4). The x-axis shows equal-sized bins of predicted probabilities of being a GWAS SNP. This particular choice of bins based on the widest range of probabilities (from M3) aids visual comparison of calibration among the three models by smoothing the proportions of observed GWAS SNPs. The y-axis shows the actual proportion of GWAS SNPs in that bin. The dashed green line at 3.5% represents the mean probability of a random SNP in the genome for being a GWAS hit or a close proxy (r2 > 0.8) for one. Bottom panel: a table of absolute counts of SNPs in each predicted probability bin for each of the predictive models. For the M1 and M2 models, no SNPs had predicted probabilities > 6.3%.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4608673&req=5

pone.0140758.g005: Multivariate logistic models predicting SNP membership in GWAS are well-calibrated.Top panel: Three models were developed for predicting the membership of a given SNP in the NHGRI GWAS Catalog, all incorporating at minimum the distance of the SNP from the transcript boundaries of its target gene and the minor allele frequency of the SNP. The "structure [M1]" model (white) also incorporates the NCBI gene structure classification of the gene (intron, coding, untranslated region, etc.) (S2 Fig); "chromstate [M2]" (gray) instead incorporates chromatin state (S2 Fig); "chromstate+eqtl [M3]" (black) incorporates both chromatin state and eQTL FDR class (Fig 4). The x-axis shows equal-sized bins of predicted probabilities of being a GWAS SNP. This particular choice of bins based on the widest range of probabilities (from M3) aids visual comparison of calibration among the three models by smoothing the proportions of observed GWAS SNPs. The y-axis shows the actual proportion of GWAS SNPs in that bin. The dashed green line at 3.5% represents the mean probability of a random SNP in the genome for being a GWAS hit or a close proxy (r2 > 0.8) for one. Bottom panel: a table of absolute counts of SNPs in each predicted probability bin for each of the predictive models. For the M1 and M2 models, no SNPs had predicted probabilities > 6.3%.

Mentions: We trained these three predictive models on the same random subset of 777,998 SNP-probe pairs, and then validated the predictive power of each model by testing against the remaining 6,894,942 SNP-probe pairs that were not included in the training set. While a randomly selected SNP-probe pair in the test set had a 3.5% chance of being a GWAS hit, all three models predicted substantial subsets of pairs to have even higher probabilities (Fig 5). All three models were well-calibrated in that for SNP-probe pairs found in a given bin of predicted probabilities, the actual observed proportion of GWAS hits among those pairs was within that predicted range. However, our complete model M3 that considered both eQTL evidence and chromatin state outperformed the smaller models M1 and M2 in one important regard: whereas the maximum predicted probabilities generated by M1 and M2 peaked at 6.0% and 5.3%, respectively (corresponding to maximum 1.7-fold and 1.5-fold chances, respectively, of being a GWAS hit compared to random), M3 derived probabilities had a greater dynamic range and was able to provide higher prediction probabilities as high as 10.0% (2.9-fold higher than chance). ROC curves for the three models showed that they were all reasonable classifiers (Fig 6), with the area under the ROC curve (AUC) being 0.645 for M1, 0.610 for M2, and 0.654 for M3. Thus, while all three of our models were improvements over chance, the model considering eQTL information was most discriminatory and most strongly predicted SNPs that would be prioritized for further functional characterization.


Expression Quantitative Trait Loci Information Improves Predictive Modeling of Disease Relevance of Non-Coding Genetic Variation.

Croteau-Chonka DC, Rogers AJ, Raj T, McGeachie MJ, Qiu W, Ziniti JP, Stubbs BJ, Liang L, Martinez FD, Strunk RC, Lemanske RF, Liu AH, Stranger BE, Carey VJ, Raby BA - PLoS ONE (2015)

Multivariate logistic models predicting SNP membership in GWAS are well-calibrated.Top panel: Three models were developed for predicting the membership of a given SNP in the NHGRI GWAS Catalog, all incorporating at minimum the distance of the SNP from the transcript boundaries of its target gene and the minor allele frequency of the SNP. The "structure [M1]" model (white) also incorporates the NCBI gene structure classification of the gene (intron, coding, untranslated region, etc.) (S2 Fig); "chromstate [M2]" (gray) instead incorporates chromatin state (S2 Fig); "chromstate+eqtl [M3]" (black) incorporates both chromatin state and eQTL FDR class (Fig 4). The x-axis shows equal-sized bins of predicted probabilities of being a GWAS SNP. This particular choice of bins based on the widest range of probabilities (from M3) aids visual comparison of calibration among the three models by smoothing the proportions of observed GWAS SNPs. The y-axis shows the actual proportion of GWAS SNPs in that bin. The dashed green line at 3.5% represents the mean probability of a random SNP in the genome for being a GWAS hit or a close proxy (r2 > 0.8) for one. Bottom panel: a table of absolute counts of SNPs in each predicted probability bin for each of the predictive models. For the M1 and M2 models, no SNPs had predicted probabilities > 6.3%.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4608673&req=5

pone.0140758.g005: Multivariate logistic models predicting SNP membership in GWAS are well-calibrated.Top panel: Three models were developed for predicting the membership of a given SNP in the NHGRI GWAS Catalog, all incorporating at minimum the distance of the SNP from the transcript boundaries of its target gene and the minor allele frequency of the SNP. The "structure [M1]" model (white) also incorporates the NCBI gene structure classification of the gene (intron, coding, untranslated region, etc.) (S2 Fig); "chromstate [M2]" (gray) instead incorporates chromatin state (S2 Fig); "chromstate+eqtl [M3]" (black) incorporates both chromatin state and eQTL FDR class (Fig 4). The x-axis shows equal-sized bins of predicted probabilities of being a GWAS SNP. This particular choice of bins based on the widest range of probabilities (from M3) aids visual comparison of calibration among the three models by smoothing the proportions of observed GWAS SNPs. The y-axis shows the actual proportion of GWAS SNPs in that bin. The dashed green line at 3.5% represents the mean probability of a random SNP in the genome for being a GWAS hit or a close proxy (r2 > 0.8) for one. Bottom panel: a table of absolute counts of SNPs in each predicted probability bin for each of the predictive models. For the M1 and M2 models, no SNPs had predicted probabilities > 6.3%.
Mentions: We trained these three predictive models on the same random subset of 777,998 SNP-probe pairs, and then validated the predictive power of each model by testing against the remaining 6,894,942 SNP-probe pairs that were not included in the training set. While a randomly selected SNP-probe pair in the test set had a 3.5% chance of being a GWAS hit, all three models predicted substantial subsets of pairs to have even higher probabilities (Fig 5). All three models were well-calibrated in that for SNP-probe pairs found in a given bin of predicted probabilities, the actual observed proportion of GWAS hits among those pairs was within that predicted range. However, our complete model M3 that considered both eQTL evidence and chromatin state outperformed the smaller models M1 and M2 in one important regard: whereas the maximum predicted probabilities generated by M1 and M2 peaked at 6.0% and 5.3%, respectively (corresponding to maximum 1.7-fold and 1.5-fold chances, respectively, of being a GWAS hit compared to random), M3 derived probabilities had a greater dynamic range and was able to provide higher prediction probabilities as high as 10.0% (2.9-fold higher than chance). ROC curves for the three models showed that they were all reasonable classifiers (Fig 6), with the area under the ROC curve (AUC) being 0.645 for M1, 0.610 for M2, and 0.654 for M3. Thus, while all three of our models were improvements over chance, the model considering eQTL information was most discriminatory and most strongly predicted SNPs that would be prioritized for further functional characterization.

Bottom Line: Disease-associated loci identified through genome-wide association studies (GWAS) frequently localize to non-coding sequence.At a false discovery rate < 5%, we mapped eQTLs for 6,535 genes; these were enriched for disease-associated genes (P < 10(-04)), particularly those related to immune diseases and metabolic traits.This complete prediction model including eQTL association information ultimately allowed for better discrimination of SNPs with higher probabilities of GWAS membership (6.3-10.0%, compared to 3.5% for a random SNP) than the other two models excluding eQTL information.

View Article: PubMed Central - PubMed

Affiliation: Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America.

ABSTRACT
Disease-associated loci identified through genome-wide association studies (GWAS) frequently localize to non-coding sequence. We and others have demonstrated strong enrichment of such single nucleotide polymorphisms (SNPs) for expression quantitative trait loci (eQTLs), supporting an important role for regulatory genetic variation in complex disease pathogenesis. Herein we describe our initial efforts to develop a predictive model of disease-associated variants leveraging eQTL information. We first catalogued cis-acting eQTLs (SNPs within 100 kb of target gene transcripts) by meta-analyzing four studies of three blood-derived tissues (n = 586). At a false discovery rate < 5%, we mapped eQTLs for 6,535 genes; these were enriched for disease-associated genes (P < 10(-04)), particularly those related to immune diseases and metabolic traits. Based on eQTL information and other variant annotations (distance from target gene transcript, minor allele frequency, and chromatin state), we created multivariate logistic regression models to predict SNP membership in reported GWAS. The complete model revealed independent contributions of specific annotations as strong predictors, including evidence for an eQTL (odds ratio (OR) = 1.2-2.0, P < 10(-11)) and the chromatin states of active promoters, different classes of strong or weak enhancers, or transcriptionally active regions (OR = 1.5-2.3, P < 10(-11)). This complete prediction model including eQTL association information ultimately allowed for better discrimination of SNPs with higher probabilities of GWAS membership (6.3-10.0%, compared to 3.5% for a random SNP) than the other two models excluding eQTL information. This eQTL-based prediction model of disease relevance can help systematically prioritize non-coding GWAS SNPs for further functional characterization.

No MeSH data available.


Related in: MedlinePlus