Limits...
Statistical estimation of correlated genome associations to a quantitative trait network.

Kim S, Xing EP - PLoS Genet. (2009)

Bottom Line: Using simulated datasets based on the HapMap consortium and an asthma dataset, we compared the performance of our method with other methods based on single-marker analysis and regression-based methods that do not use any of the relational information in the traits.We found that our method showed an increased power in detecting causal variants affecting correlated traits.Our results showed that, when correlation patterns among traits in a QTN are considered explicitly and directly during a structured multivariate genome association analysis using our proposed methods, the power of detecting true causal SNPs with possibly pleiotropic effects increased significantly without compromising performance on non-pleiotropic SNPs.

View Article: PubMed Central - PubMed

Affiliation: School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.

ABSTRACT
Many complex disease syndromes, such as asthma, consist of a large number of highly related, rather than independent, clinical or molecular phenotypes. This raises a new technical challenge in identifying genetic variations associated simultaneously with correlated traits. In this study, we propose a new statistical framework called graph-guided fused lasso (GFlasso) to directly and effectively incorporate the correlation structure of multiple quantitative traits such as clinical metrics and gene expressions in association analysis. Our approach represents correlation information explicitly among the quantitative traits as a quantitative trait network (QTN) and then leverages this network to encode structured regularization functions in a multivariate regression model over the genotypes and traits. The result is that the genetic markers that jointly influence subgroups of highly correlated traits can be detected jointly with high sensitivity and specificity. While most of the traditional methods examined each phenotype independently and combined the results afterwards, our approach analyzes all of the traits jointly in a single statistical framework. This allows our method to borrow information across correlated phenotypes to discover the genetic markers that perturb a subset of the correlated traits synergistically. Using simulated datasets based on the HapMap consortium and an asthma dataset, we compared the performance of our method with other methods based on single-marker analysis and regression-based methods that do not use any of the relational information in the traits. We found that our method showed an increased power in detecting causal variants affecting correlated traits. Our results showed that, when correlation patterns among traits in a QTN are considered explicitly and directly during a structured multivariate genome association analysis using our proposed methods, the power of detecting true causal SNPs with possibly pleiotropic effects increased significantly without compromising performance on non-pleiotropic SNPs.

Show MeSH

Related in: MedlinePlus

Results from the association analysis of the asthma dataset.(A) The correlation matrix of 53 asthma-related clinical traits. A pixel at row  and column  corresponds to the absolute magnitude of correlation between node  and  in the QTN depicted in Figure 1. (B) The trait correlation matrix thresholded at . The black pixels in the lower triangular part of the matrix indicate edges between each pair of traits. (C) The matrix of  shows the linkage disequilibrium structure in the 34 SNPs in gene IL-4R. (D)  from single-marker/single-trait association tests after 2000 permutations. (E) The SNP-trait pairs that the single-marker/single-trait analyses with permutation tests in (D) find significant at  are shown as black pixels. (F) The SNP-trait pairs with significant association at  based on the  in (D) are shown as black pixels. Estimated  are shown for (G) ridge regression, (H) PCA-based regression, (I) lasso, (J) , (K) , and (L) . In Panels (D)–(L), rows correspond to SNPs, and columns to phenotypes.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2719086&req=5

pgen-1000587-g011: Results from the association analysis of the asthma dataset.(A) The correlation matrix of 53 asthma-related clinical traits. A pixel at row and column corresponds to the absolute magnitude of correlation between node and in the QTN depicted in Figure 1. (B) The trait correlation matrix thresholded at . The black pixels in the lower triangular part of the matrix indicate edges between each pair of traits. (C) The matrix of shows the linkage disequilibrium structure in the 34 SNPs in gene IL-4R. (D) from single-marker/single-trait association tests after 2000 permutations. (E) The SNP-trait pairs that the single-marker/single-trait analyses with permutation tests in (D) find significant at are shown as black pixels. (F) The SNP-trait pairs with significant association at based on the in (D) are shown as black pixels. Estimated are shown for (G) ridge regression, (H) PCA-based regression, (I) lasso, (J) , (K) , and (L) . In Panels (D)–(L), rows correspond to SNPs, and columns to phenotypes.

Mentions: Before searching for associations between SNPs and traits, we first examined the correlation structure in the 53 clinical traits in question. We first computed the pairwise correlations between these traits as depicted in Figure 11A, and thresholded the correlations at to obtain the QTN in Figure 1. The rows and columns in the matrix in Figure 11A were ordered via an agglomerative hierarchical clustering algorithm so that highly correlated traits were next to each other in the linear ordering and formed apparent blocks in the matrix corresponding to subsets of highly inter-correlated traits. Recall that uses only edge connectivities but not their weights in the QTN. For the ease of comparison, we graphically display this QTN in Figure 11B, where the black pixel at position indicates that the and phenotypes are connected with an edge in the QTN. It is easy to see the correspondences between the blocks (i.e., clusters) of black pixels in Figure 11B and the subgraphs of correlated traits in Figure 1. For example, the traits representing quality of life of the patients (the nodes for QLEnvironment, QLSymptom, QLEmotion, and QLActivity) appear as a small subnetwork near the center of Figure 1 as well as the block of black pixels at the upper left corner of Figure 11B. We find another subnetwork consisting of three traits related to asthma symptoms (the nodes for Wheezy, Sputum, ChestTight) near the upper right corner of Figure 1 and as the second cluster from the left in Figure 11B. The cluster of traits from columns 11 through 18 and the next cluster from columns 19 through 25 in Figure 11B corresponds to the two densely connected subnetworks within the large subnetwork on the left-hand side of Figure 1 that consists of traits related to lung physiology (the nodes for BaseFEV1, PreFEFPred, PostbroPred, PredrugFEV1P, MaxFEV1P, etc.). Based on Figure 1 and Figure 11B, we concluded that the QTN obtained at threshold captured the previously known clusters of asthma-related traits, and we used this network in our multiple-trait association analysis with GFlasso methods.


Statistical estimation of correlated genome associations to a quantitative trait network.

Kim S, Xing EP - PLoS Genet. (2009)

Results from the association analysis of the asthma dataset.(A) The correlation matrix of 53 asthma-related clinical traits. A pixel at row  and column  corresponds to the absolute magnitude of correlation between node  and  in the QTN depicted in Figure 1. (B) The trait correlation matrix thresholded at . The black pixels in the lower triangular part of the matrix indicate edges between each pair of traits. (C) The matrix of  shows the linkage disequilibrium structure in the 34 SNPs in gene IL-4R. (D)  from single-marker/single-trait association tests after 2000 permutations. (E) The SNP-trait pairs that the single-marker/single-trait analyses with permutation tests in (D) find significant at  are shown as black pixels. (F) The SNP-trait pairs with significant association at  based on the  in (D) are shown as black pixels. Estimated  are shown for (G) ridge regression, (H) PCA-based regression, (I) lasso, (J) , (K) , and (L) . In Panels (D)–(L), rows correspond to SNPs, and columns to phenotypes.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2719086&req=5

pgen-1000587-g011: Results from the association analysis of the asthma dataset.(A) The correlation matrix of 53 asthma-related clinical traits. A pixel at row and column corresponds to the absolute magnitude of correlation between node and in the QTN depicted in Figure 1. (B) The trait correlation matrix thresholded at . The black pixels in the lower triangular part of the matrix indicate edges between each pair of traits. (C) The matrix of shows the linkage disequilibrium structure in the 34 SNPs in gene IL-4R. (D) from single-marker/single-trait association tests after 2000 permutations. (E) The SNP-trait pairs that the single-marker/single-trait analyses with permutation tests in (D) find significant at are shown as black pixels. (F) The SNP-trait pairs with significant association at based on the in (D) are shown as black pixels. Estimated are shown for (G) ridge regression, (H) PCA-based regression, (I) lasso, (J) , (K) , and (L) . In Panels (D)–(L), rows correspond to SNPs, and columns to phenotypes.
Mentions: Before searching for associations between SNPs and traits, we first examined the correlation structure in the 53 clinical traits in question. We first computed the pairwise correlations between these traits as depicted in Figure 11A, and thresholded the correlations at to obtain the QTN in Figure 1. The rows and columns in the matrix in Figure 11A were ordered via an agglomerative hierarchical clustering algorithm so that highly correlated traits were next to each other in the linear ordering and formed apparent blocks in the matrix corresponding to subsets of highly inter-correlated traits. Recall that uses only edge connectivities but not their weights in the QTN. For the ease of comparison, we graphically display this QTN in Figure 11B, where the black pixel at position indicates that the and phenotypes are connected with an edge in the QTN. It is easy to see the correspondences between the blocks (i.e., clusters) of black pixels in Figure 11B and the subgraphs of correlated traits in Figure 1. For example, the traits representing quality of life of the patients (the nodes for QLEnvironment, QLSymptom, QLEmotion, and QLActivity) appear as a small subnetwork near the center of Figure 1 as well as the block of black pixels at the upper left corner of Figure 11B. We find another subnetwork consisting of three traits related to asthma symptoms (the nodes for Wheezy, Sputum, ChestTight) near the upper right corner of Figure 1 and as the second cluster from the left in Figure 11B. The cluster of traits from columns 11 through 18 and the next cluster from columns 19 through 25 in Figure 11B corresponds to the two densely connected subnetworks within the large subnetwork on the left-hand side of Figure 1 that consists of traits related to lung physiology (the nodes for BaseFEV1, PreFEFPred, PostbroPred, PredrugFEV1P, MaxFEV1P, etc.). Based on Figure 1 and Figure 11B, we concluded that the QTN obtained at threshold captured the previously known clusters of asthma-related traits, and we used this network in our multiple-trait association analysis with GFlasso methods.

Bottom Line: Using simulated datasets based on the HapMap consortium and an asthma dataset, we compared the performance of our method with other methods based on single-marker analysis and regression-based methods that do not use any of the relational information in the traits.We found that our method showed an increased power in detecting causal variants affecting correlated traits.Our results showed that, when correlation patterns among traits in a QTN are considered explicitly and directly during a structured multivariate genome association analysis using our proposed methods, the power of detecting true causal SNPs with possibly pleiotropic effects increased significantly without compromising performance on non-pleiotropic SNPs.

View Article: PubMed Central - PubMed

Affiliation: School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.

ABSTRACT
Many complex disease syndromes, such as asthma, consist of a large number of highly related, rather than independent, clinical or molecular phenotypes. This raises a new technical challenge in identifying genetic variations associated simultaneously with correlated traits. In this study, we propose a new statistical framework called graph-guided fused lasso (GFlasso) to directly and effectively incorporate the correlation structure of multiple quantitative traits such as clinical metrics and gene expressions in association analysis. Our approach represents correlation information explicitly among the quantitative traits as a quantitative trait network (QTN) and then leverages this network to encode structured regularization functions in a multivariate regression model over the genotypes and traits. The result is that the genetic markers that jointly influence subgroups of highly correlated traits can be detected jointly with high sensitivity and specificity. While most of the traditional methods examined each phenotype independently and combined the results afterwards, our approach analyzes all of the traits jointly in a single statistical framework. This allows our method to borrow information across correlated phenotypes to discover the genetic markers that perturb a subset of the correlated traits synergistically. Using simulated datasets based on the HapMap consortium and an asthma dataset, we compared the performance of our method with other methods based on single-marker analysis and regression-based methods that do not use any of the relational information in the traits. We found that our method showed an increased power in detecting causal variants affecting correlated traits. Our results showed that, when correlation patterns among traits in a QTN are considered explicitly and directly during a structured multivariate genome association analysis using our proposed methods, the power of detecting true causal SNPs with possibly pleiotropic effects increased significantly without compromising performance on non-pleiotropic SNPs.

Show MeSH
Related in: MedlinePlus