Limits...
Expression Quantitative Trait Loci Information Improves Predictive Modeling of Disease Relevance of Non-Coding Genetic Variation.

Croteau-Chonka DC, Rogers AJ, Raj T, McGeachie MJ, Qiu W, Ziniti JP, Stubbs BJ, Liang L, Martinez FD, Strunk RC, Lemanske RF, Liu AH, Stranger BE, Carey VJ, Raby BA - PLoS ONE (2015)

Bottom Line: Disease-associated loci identified through genome-wide association studies (GWAS) frequently localize to non-coding sequence.At a false discovery rate < 5%, we mapped eQTLs for 6,535 genes; these were enriched for disease-associated genes (P < 10(-04)), particularly those related to immune diseases and metabolic traits.This complete prediction model including eQTL association information ultimately allowed for better discrimination of SNPs with higher probabilities of GWAS membership (6.3-10.0%, compared to 3.5% for a random SNP) than the other two models excluding eQTL information.

View Article: PubMed Central - PubMed

Affiliation: Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America.

ABSTRACT
Disease-associated loci identified through genome-wide association studies (GWAS) frequently localize to non-coding sequence. We and others have demonstrated strong enrichment of such single nucleotide polymorphisms (SNPs) for expression quantitative trait loci (eQTLs), supporting an important role for regulatory genetic variation in complex disease pathogenesis. Herein we describe our initial efforts to develop a predictive model of disease-associated variants leveraging eQTL information. We first catalogued cis-acting eQTLs (SNPs within 100 kb of target gene transcripts) by meta-analyzing four studies of three blood-derived tissues (n = 586). At a false discovery rate < 5%, we mapped eQTLs for 6,535 genes; these were enriched for disease-associated genes (P < 10(-04)), particularly those related to immune diseases and metabolic traits. Based on eQTL information and other variant annotations (distance from target gene transcript, minor allele frequency, and chromatin state), we created multivariate logistic regression models to predict SNP membership in reported GWAS. The complete model revealed independent contributions of specific annotations as strong predictors, including evidence for an eQTL (odds ratio (OR) = 1.2-2.0, P < 10(-11)) and the chromatin states of active promoters, different classes of strong or weak enhancers, or transcriptionally active regions (OR = 1.5-2.3, P < 10(-11)). This complete prediction model including eQTL association information ultimately allowed for better discrimination of SNPs with higher probabilities of GWAS membership (6.3-10.0%, compared to 3.5% for a random SNP) than the other two models excluding eQTL information. This eQTL-based prediction model of disease relevance can help systematically prioritize non-coding GWAS SNPs for further functional characterization.

No MeSH data available.


Related in: MedlinePlus

ROC curves for multivariate logistic models predicting SNP membership in GWAS.Components of the three predictive models are described in Fig 5.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4608673&req=5

pone.0140758.g006: ROC curves for multivariate logistic models predicting SNP membership in GWAS.Components of the three predictive models are described in Fig 5.

Mentions: We trained these three predictive models on the same random subset of 777,998 SNP-probe pairs, and then validated the predictive power of each model by testing against the remaining 6,894,942 SNP-probe pairs that were not included in the training set. While a randomly selected SNP-probe pair in the test set had a 3.5% chance of being a GWAS hit, all three models predicted substantial subsets of pairs to have even higher probabilities (Fig 5). All three models were well-calibrated in that for SNP-probe pairs found in a given bin of predicted probabilities, the actual observed proportion of GWAS hits among those pairs was within that predicted range. However, our complete model M3 that considered both eQTL evidence and chromatin state outperformed the smaller models M1 and M2 in one important regard: whereas the maximum predicted probabilities generated by M1 and M2 peaked at 6.0% and 5.3%, respectively (corresponding to maximum 1.7-fold and 1.5-fold chances, respectively, of being a GWAS hit compared to random), M3 derived probabilities had a greater dynamic range and was able to provide higher prediction probabilities as high as 10.0% (2.9-fold higher than chance). ROC curves for the three models showed that they were all reasonable classifiers (Fig 6), with the area under the ROC curve (AUC) being 0.645 for M1, 0.610 for M2, and 0.654 for M3. Thus, while all three of our models were improvements over chance, the model considering eQTL information was most discriminatory and most strongly predicted SNPs that would be prioritized for further functional characterization.


Expression Quantitative Trait Loci Information Improves Predictive Modeling of Disease Relevance of Non-Coding Genetic Variation.

Croteau-Chonka DC, Rogers AJ, Raj T, McGeachie MJ, Qiu W, Ziniti JP, Stubbs BJ, Liang L, Martinez FD, Strunk RC, Lemanske RF, Liu AH, Stranger BE, Carey VJ, Raby BA - PLoS ONE (2015)

ROC curves for multivariate logistic models predicting SNP membership in GWAS.Components of the three predictive models are described in Fig 5.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4608673&req=5

pone.0140758.g006: ROC curves for multivariate logistic models predicting SNP membership in GWAS.Components of the three predictive models are described in Fig 5.
Mentions: We trained these three predictive models on the same random subset of 777,998 SNP-probe pairs, and then validated the predictive power of each model by testing against the remaining 6,894,942 SNP-probe pairs that were not included in the training set. While a randomly selected SNP-probe pair in the test set had a 3.5% chance of being a GWAS hit, all three models predicted substantial subsets of pairs to have even higher probabilities (Fig 5). All three models were well-calibrated in that for SNP-probe pairs found in a given bin of predicted probabilities, the actual observed proportion of GWAS hits among those pairs was within that predicted range. However, our complete model M3 that considered both eQTL evidence and chromatin state outperformed the smaller models M1 and M2 in one important regard: whereas the maximum predicted probabilities generated by M1 and M2 peaked at 6.0% and 5.3%, respectively (corresponding to maximum 1.7-fold and 1.5-fold chances, respectively, of being a GWAS hit compared to random), M3 derived probabilities had a greater dynamic range and was able to provide higher prediction probabilities as high as 10.0% (2.9-fold higher than chance). ROC curves for the three models showed that they were all reasonable classifiers (Fig 6), with the area under the ROC curve (AUC) being 0.645 for M1, 0.610 for M2, and 0.654 for M3. Thus, while all three of our models were improvements over chance, the model considering eQTL information was most discriminatory and most strongly predicted SNPs that would be prioritized for further functional characterization.

Bottom Line: Disease-associated loci identified through genome-wide association studies (GWAS) frequently localize to non-coding sequence.At a false discovery rate < 5%, we mapped eQTLs for 6,535 genes; these were enriched for disease-associated genes (P < 10(-04)), particularly those related to immune diseases and metabolic traits.This complete prediction model including eQTL association information ultimately allowed for better discrimination of SNPs with higher probabilities of GWAS membership (6.3-10.0%, compared to 3.5% for a random SNP) than the other two models excluding eQTL information.

View Article: PubMed Central - PubMed

Affiliation: Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America.

ABSTRACT
Disease-associated loci identified through genome-wide association studies (GWAS) frequently localize to non-coding sequence. We and others have demonstrated strong enrichment of such single nucleotide polymorphisms (SNPs) for expression quantitative trait loci (eQTLs), supporting an important role for regulatory genetic variation in complex disease pathogenesis. Herein we describe our initial efforts to develop a predictive model of disease-associated variants leveraging eQTL information. We first catalogued cis-acting eQTLs (SNPs within 100 kb of target gene transcripts) by meta-analyzing four studies of three blood-derived tissues (n = 586). At a false discovery rate < 5%, we mapped eQTLs for 6,535 genes; these were enriched for disease-associated genes (P < 10(-04)), particularly those related to immune diseases and metabolic traits. Based on eQTL information and other variant annotations (distance from target gene transcript, minor allele frequency, and chromatin state), we created multivariate logistic regression models to predict SNP membership in reported GWAS. The complete model revealed independent contributions of specific annotations as strong predictors, including evidence for an eQTL (odds ratio (OR) = 1.2-2.0, P < 10(-11)) and the chromatin states of active promoters, different classes of strong or weak enhancers, or transcriptionally active regions (OR = 1.5-2.3, P < 10(-11)). This complete prediction model including eQTL association information ultimately allowed for better discrimination of SNPs with higher probabilities of GWAS membership (6.3-10.0%, compared to 3.5% for a random SNP) than the other two models excluding eQTL information. This eQTL-based prediction model of disease relevance can help systematically prioritize non-coding GWAS SNPs for further functional characterization.

No MeSH data available.


Related in: MedlinePlus