Limits...
Statistical estimation of correlated genome associations to a quantitative trait network.

Kim S, Xing EP - PLoS Genet. (2009)

Bottom Line: Using simulated datasets based on the HapMap consortium and an asthma dataset, we compared the performance of our method with other methods based on single-marker analysis and regression-based methods that do not use any of the relational information in the traits.We found that our method showed an increased power in detecting causal variants affecting correlated traits.Our results showed that, when correlation patterns among traits in a QTN are considered explicitly and directly during a structured multivariate genome association analysis using our proposed methods, the power of detecting true causal SNPs with possibly pleiotropic effects increased significantly without compromising performance on non-pleiotropic SNPs.

View Article: PubMed Central - PubMed

Affiliation: School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.

ABSTRACT
Many complex disease syndromes, such as asthma, consist of a large number of highly related, rather than independent, clinical or molecular phenotypes. This raises a new technical challenge in identifying genetic variations associated simultaneously with correlated traits. In this study, we propose a new statistical framework called graph-guided fused lasso (GFlasso) to directly and effectively incorporate the correlation structure of multiple quantitative traits such as clinical metrics and gene expressions in association analysis. Our approach represents correlation information explicitly among the quantitative traits as a quantitative trait network (QTN) and then leverages this network to encode structured regularization functions in a multivariate regression model over the genotypes and traits. The result is that the genetic markers that jointly influence subgroups of highly correlated traits can be detected jointly with high sensitivity and specificity. While most of the traditional methods examined each phenotype independently and combined the results afterwards, our approach analyzes all of the traits jointly in a single statistical framework. This allows our method to borrow information across correlated phenotypes to discover the genetic markers that perturb a subset of the correlated traits synergistically. Using simulated datasets based on the HapMap consortium and an asthma dataset, we compared the performance of our method with other methods based on single-marker analysis and regression-based methods that do not use any of the relational information in the traits. We found that our method showed an increased power in detecting causal variants affecting correlated traits. Our results showed that, when correlation patterns among traits in a QTN are considered explicitly and directly during a structured multivariate genome association analysis using our proposed methods, the power of detecting true causal SNPs with possibly pleiotropic effects increased significantly without compromising performance on non-pleiotropic SNPs.

Show MeSH

Related in: MedlinePlus

ROC curves comparing association analysis methods when the threshold  for producing the QTN varies.Panels show the threshold (A) , (B) , (C) , and (D) . The sample size was 100, and the association strength was 0.8. The results were averaged over 50 simulated datasets. In Panels (B) and (C), the ROC curves for ,  and  almost entirely overlap. In Panel (D), the ROC curves for lasso, ,  and  almost entirely overlap.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2719086&req=5

pgen-1000587-g007: ROC curves comparing association analysis methods when the threshold for producing the QTN varies.Panels show the threshold (A) , (B) , (C) , and (D) . The sample size was 100, and the association strength was 0.8. The results were averaged over 50 simulated datasets. In Panels (B) and (C), the ROC curves for , and almost entirely overlap. In Panel (D), the ROC curves for lasso, , and almost entirely overlap.

Mentions: Next, we examined the sensitivity of the GFlasso methods to how the trait correlation network is generated, by varying the threshold of edge weights from 0.1 to 0.3, 0.5 and 0.7. With lower values of , more edges would be included in the QTN, some of which represent only weak correlations. The purpose of this experiment was to see whether the performance of the GFlasso methods is negatively affected by the presence of these weak and possibly spurious edges that were included due to noise rather than from a true correlation. The results for QTL recovery averaged over 50 datasets with sample size and association strength 0.8 (Case 1), are presented in Figure 7. We also include the ROC curves for the methods that did not use the QTN in each panel of Figure 7 for the ease of comparison. As in Figure 6, did not have the flexibility of accommodating edges of varying correlation strength in the QTN, and again, this deficiency compromised the performance of at the low threshold , as shown in Figure 7A. On the other hand, and exhibited a greater power than all other methods even at a low threshold . As the threshold increased, the inferred QTN included only those edges with significant correlations. Thus, the performance of approached that of and , and the ROC curves of the three methods in the GFlasso family overlapped almost entirely (Figure 7B and Figure 7C). When the threshold became even higher, e.g., , the number of edges in the QTN became close to 0, effectively removing the fusion penalty. As a result, the performances of all of the graph-guided methods approached that of lasso, and the four ROC curves became overlapping (Figure 7D). Overall, we conclude that when flexible structured methods such as and are used, taking into account the correlation structure in phenotypes improves the power of detecting true causal SNPs regardless of the values for . In addition, once the QTN contains edges that capture strong correlations, including more edges beyond this point by further lowering the threshold does not significantly affect the performance of and .


Statistical estimation of correlated genome associations to a quantitative trait network.

Kim S, Xing EP - PLoS Genet. (2009)

ROC curves comparing association analysis methods when the threshold  for producing the QTN varies.Panels show the threshold (A) , (B) , (C) , and (D) . The sample size was 100, and the association strength was 0.8. The results were averaged over 50 simulated datasets. In Panels (B) and (C), the ROC curves for ,  and  almost entirely overlap. In Panel (D), the ROC curves for lasso, ,  and  almost entirely overlap.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2719086&req=5

pgen-1000587-g007: ROC curves comparing association analysis methods when the threshold for producing the QTN varies.Panels show the threshold (A) , (B) , (C) , and (D) . The sample size was 100, and the association strength was 0.8. The results were averaged over 50 simulated datasets. In Panels (B) and (C), the ROC curves for , and almost entirely overlap. In Panel (D), the ROC curves for lasso, , and almost entirely overlap.
Mentions: Next, we examined the sensitivity of the GFlasso methods to how the trait correlation network is generated, by varying the threshold of edge weights from 0.1 to 0.3, 0.5 and 0.7. With lower values of , more edges would be included in the QTN, some of which represent only weak correlations. The purpose of this experiment was to see whether the performance of the GFlasso methods is negatively affected by the presence of these weak and possibly spurious edges that were included due to noise rather than from a true correlation. The results for QTL recovery averaged over 50 datasets with sample size and association strength 0.8 (Case 1), are presented in Figure 7. We also include the ROC curves for the methods that did not use the QTN in each panel of Figure 7 for the ease of comparison. As in Figure 6, did not have the flexibility of accommodating edges of varying correlation strength in the QTN, and again, this deficiency compromised the performance of at the low threshold , as shown in Figure 7A. On the other hand, and exhibited a greater power than all other methods even at a low threshold . As the threshold increased, the inferred QTN included only those edges with significant correlations. Thus, the performance of approached that of and , and the ROC curves of the three methods in the GFlasso family overlapped almost entirely (Figure 7B and Figure 7C). When the threshold became even higher, e.g., , the number of edges in the QTN became close to 0, effectively removing the fusion penalty. As a result, the performances of all of the graph-guided methods approached that of lasso, and the four ROC curves became overlapping (Figure 7D). Overall, we conclude that when flexible structured methods such as and are used, taking into account the correlation structure in phenotypes improves the power of detecting true causal SNPs regardless of the values for . In addition, once the QTN contains edges that capture strong correlations, including more edges beyond this point by further lowering the threshold does not significantly affect the performance of and .

Bottom Line: Using simulated datasets based on the HapMap consortium and an asthma dataset, we compared the performance of our method with other methods based on single-marker analysis and regression-based methods that do not use any of the relational information in the traits.We found that our method showed an increased power in detecting causal variants affecting correlated traits.Our results showed that, when correlation patterns among traits in a QTN are considered explicitly and directly during a structured multivariate genome association analysis using our proposed methods, the power of detecting true causal SNPs with possibly pleiotropic effects increased significantly without compromising performance on non-pleiotropic SNPs.

View Article: PubMed Central - PubMed

Affiliation: School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.

ABSTRACT
Many complex disease syndromes, such as asthma, consist of a large number of highly related, rather than independent, clinical or molecular phenotypes. This raises a new technical challenge in identifying genetic variations associated simultaneously with correlated traits. In this study, we propose a new statistical framework called graph-guided fused lasso (GFlasso) to directly and effectively incorporate the correlation structure of multiple quantitative traits such as clinical metrics and gene expressions in association analysis. Our approach represents correlation information explicitly among the quantitative traits as a quantitative trait network (QTN) and then leverages this network to encode structured regularization functions in a multivariate regression model over the genotypes and traits. The result is that the genetic markers that jointly influence subgroups of highly correlated traits can be detected jointly with high sensitivity and specificity. While most of the traditional methods examined each phenotype independently and combined the results afterwards, our approach analyzes all of the traits jointly in a single statistical framework. This allows our method to borrow information across correlated phenotypes to discover the genetic markers that perturb a subset of the correlated traits synergistically. Using simulated datasets based on the HapMap consortium and an asthma dataset, we compared the performance of our method with other methods based on single-marker analysis and regression-based methods that do not use any of the relational information in the traits. We found that our method showed an increased power in detecting causal variants affecting correlated traits. Our results showed that, when correlation patterns among traits in a QTN are considered explicitly and directly during a structured multivariate genome association analysis using our proposed methods, the power of detecting true causal SNPs with possibly pleiotropic effects increased significantly without compromising performance on non-pleiotropic SNPs.

Show MeSH
Related in: MedlinePlus