Limits...
Statistical estimation of correlated genome associations to a quantitative trait network.

Kim S, Xing EP - PLoS Genet. (2009)

Bottom Line: Using simulated datasets based on the HapMap consortium and an asthma dataset, we compared the performance of our method with other methods based on single-marker analysis and regression-based methods that do not use any of the relational information in the traits.We found that our method showed an increased power in detecting causal variants affecting correlated traits.Our results showed that, when correlation patterns among traits in a QTN are considered explicitly and directly during a structured multivariate genome association analysis using our proposed methods, the power of detecting true causal SNPs with possibly pleiotropic effects increased significantly without compromising performance on non-pleiotropic SNPs.

View Article: PubMed Central - PubMed

Affiliation: School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.

ABSTRACT
Many complex disease syndromes, such as asthma, consist of a large number of highly related, rather than independent, clinical or molecular phenotypes. This raises a new technical challenge in identifying genetic variations associated simultaneously with correlated traits. In this study, we propose a new statistical framework called graph-guided fused lasso (GFlasso) to directly and effectively incorporate the correlation structure of multiple quantitative traits such as clinical metrics and gene expressions in association analysis. Our approach represents correlation information explicitly among the quantitative traits as a quantitative trait network (QTN) and then leverages this network to encode structured regularization functions in a multivariate regression model over the genotypes and traits. The result is that the genetic markers that jointly influence subgroups of highly correlated traits can be detected jointly with high sensitivity and specificity. While most of the traditional methods examined each phenotype independently and combined the results afterwards, our approach analyzes all of the traits jointly in a single statistical framework. This allows our method to borrow information across correlated phenotypes to discover the genetic markers that perturb a subset of the correlated traits synergistically. Using simulated datasets based on the HapMap consortium and an asthma dataset, we compared the performance of our method with other methods based on single-marker analysis and regression-based methods that do not use any of the relational information in the traits. We found that our method showed an increased power in detecting causal variants affecting correlated traits. Our results showed that, when correlation patterns among traits in a QTN are considered explicitly and directly during a structured multivariate genome association analysis using our proposed methods, the power of detecting true causal SNPs with possibly pleiotropic effects increased significantly without compromising performance on non-pleiotropic SNPs.

Show MeSH

Related in: MedlinePlus

Results of association analysis by different methods based on a single simulated dataset.Association strength 0.8 and threshold  for the QTN were used. (A) The  correlation coefficient matrix of traits. It contains three blocks of correlated traits of sizes 3, 3, and 4, respectively. (B) The correlation coefficient matrix in (A) thresholded at . The black pixels in the lower triangular part of the matrix indicate edges included in GFlasso. (C) The true regression coefficients and sparsity pattern used in simulation. (D) , where  were obtained from single-SNP permutation tests performed for each phenotype separately. (E) Black pixels indicate SNP-trait pairs with significant association at  based on the results of  in (D). Values of the estimated regression coefficients are shown for (F) ridge regression, (G) PCA-based regression, (H) lasso, (I) , (J) , and (K) . In Panels (C)–(K), rows correspond to SNPs, and columns to phenotypes. Columns for traits in (C)–(K) are aligned with the columns in (A) and (B).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2719086&req=5

pgen-1000587-g004: Results of association analysis by different methods based on a single simulated dataset.Association strength 0.8 and threshold for the QTN were used. (A) The correlation coefficient matrix of traits. It contains three blocks of correlated traits of sizes 3, 3, and 4, respectively. (B) The correlation coefficient matrix in (A) thresholded at . The black pixels in the lower triangular part of the matrix indicate edges included in GFlasso. (C) The true regression coefficients and sparsity pattern used in simulation. (D) , where were obtained from single-SNP permutation tests performed for each phenotype separately. (E) Black pixels indicate SNP-trait pairs with significant association at based on the results of in (D). Values of the estimated regression coefficients are shown for (F) ridge regression, (G) PCA-based regression, (H) lasso, (I) , (J) , and (K) . In Panels (C)–(K), rows correspond to SNPs, and columns to phenotypes. Columns for traits in (C)–(K) are aligned with the columns in (A) and (B).

Mentions: As an illustrative example of the behaviors of the different methods, a graphical display of the QTN and the estimated QTL sets for all traits in the QTN is presented in Figure 4 for a simulated dataset of samples, with association strengths (i.e., regression coefficients ) all set to 0.8 for SNP-trait pairs with true associations (Case 1). The trait correlation matrix in Figure 4A shows blocks of correlated traits. Using a threshold , we obtained a QTN in Figure 4B, where the black pixels in the lower triangular part indicate the presence of edges between two traits. Given the true regression coefficients in Figure 4C, we recovered the SNP-trait pairs with true association using our methods and competing ones mentioned above. It is apparent from Figure 4 that many false positives show up in the results of the single-marker/single-trait analyses, multivariate regression methods, and the PCA-based method. Furthermore, these reference benches do not identify the block structure of SNPs affecting multiple traits jointly, which is clear in the true regression coefficients. On the other hand, the results from in Figures 4I–K show fewer false positives, and reveal clear block structures. This experiment suggests that borrowing information across correlated traits in a QTN, as in the GFlasso methods, can significantly increase the power of discovering true causal SNPs. Since uses an unweighted trait network, often the regression coefficients for a given SNP have been fused excessively across traits even between only weakly correlated traits, especially among the first six traits on the upper left corner of Figure 4B that involve two smaller subnetworks within the subnetwork. This undesirable property of mostly disappeared when we incorporated the edge weights in and as shown in Figure 4J and Figure 4K.


Statistical estimation of correlated genome associations to a quantitative trait network.

Kim S, Xing EP - PLoS Genet. (2009)

Results of association analysis by different methods based on a single simulated dataset.Association strength 0.8 and threshold  for the QTN were used. (A) The  correlation coefficient matrix of traits. It contains three blocks of correlated traits of sizes 3, 3, and 4, respectively. (B) The correlation coefficient matrix in (A) thresholded at . The black pixels in the lower triangular part of the matrix indicate edges included in GFlasso. (C) The true regression coefficients and sparsity pattern used in simulation. (D) , where  were obtained from single-SNP permutation tests performed for each phenotype separately. (E) Black pixels indicate SNP-trait pairs with significant association at  based on the results of  in (D). Values of the estimated regression coefficients are shown for (F) ridge regression, (G) PCA-based regression, (H) lasso, (I) , (J) , and (K) . In Panels (C)–(K), rows correspond to SNPs, and columns to phenotypes. Columns for traits in (C)–(K) are aligned with the columns in (A) and (B).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2719086&req=5

pgen-1000587-g004: Results of association analysis by different methods based on a single simulated dataset.Association strength 0.8 and threshold for the QTN were used. (A) The correlation coefficient matrix of traits. It contains three blocks of correlated traits of sizes 3, 3, and 4, respectively. (B) The correlation coefficient matrix in (A) thresholded at . The black pixels in the lower triangular part of the matrix indicate edges included in GFlasso. (C) The true regression coefficients and sparsity pattern used in simulation. (D) , where were obtained from single-SNP permutation tests performed for each phenotype separately. (E) Black pixels indicate SNP-trait pairs with significant association at based on the results of in (D). Values of the estimated regression coefficients are shown for (F) ridge regression, (G) PCA-based regression, (H) lasso, (I) , (J) , and (K) . In Panels (C)–(K), rows correspond to SNPs, and columns to phenotypes. Columns for traits in (C)–(K) are aligned with the columns in (A) and (B).
Mentions: As an illustrative example of the behaviors of the different methods, a graphical display of the QTN and the estimated QTL sets for all traits in the QTN is presented in Figure 4 for a simulated dataset of samples, with association strengths (i.e., regression coefficients ) all set to 0.8 for SNP-trait pairs with true associations (Case 1). The trait correlation matrix in Figure 4A shows blocks of correlated traits. Using a threshold , we obtained a QTN in Figure 4B, where the black pixels in the lower triangular part indicate the presence of edges between two traits. Given the true regression coefficients in Figure 4C, we recovered the SNP-trait pairs with true association using our methods and competing ones mentioned above. It is apparent from Figure 4 that many false positives show up in the results of the single-marker/single-trait analyses, multivariate regression methods, and the PCA-based method. Furthermore, these reference benches do not identify the block structure of SNPs affecting multiple traits jointly, which is clear in the true regression coefficients. On the other hand, the results from in Figures 4I–K show fewer false positives, and reveal clear block structures. This experiment suggests that borrowing information across correlated traits in a QTN, as in the GFlasso methods, can significantly increase the power of discovering true causal SNPs. Since uses an unweighted trait network, often the regression coefficients for a given SNP have been fused excessively across traits even between only weakly correlated traits, especially among the first six traits on the upper left corner of Figure 4B that involve two smaller subnetworks within the subnetwork. This undesirable property of mostly disappeared when we incorporated the edge weights in and as shown in Figure 4J and Figure 4K.

Bottom Line: Using simulated datasets based on the HapMap consortium and an asthma dataset, we compared the performance of our method with other methods based on single-marker analysis and regression-based methods that do not use any of the relational information in the traits.We found that our method showed an increased power in detecting causal variants affecting correlated traits.Our results showed that, when correlation patterns among traits in a QTN are considered explicitly and directly during a structured multivariate genome association analysis using our proposed methods, the power of detecting true causal SNPs with possibly pleiotropic effects increased significantly without compromising performance on non-pleiotropic SNPs.

View Article: PubMed Central - PubMed

Affiliation: School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.

ABSTRACT
Many complex disease syndromes, such as asthma, consist of a large number of highly related, rather than independent, clinical or molecular phenotypes. This raises a new technical challenge in identifying genetic variations associated simultaneously with correlated traits. In this study, we propose a new statistical framework called graph-guided fused lasso (GFlasso) to directly and effectively incorporate the correlation structure of multiple quantitative traits such as clinical metrics and gene expressions in association analysis. Our approach represents correlation information explicitly among the quantitative traits as a quantitative trait network (QTN) and then leverages this network to encode structured regularization functions in a multivariate regression model over the genotypes and traits. The result is that the genetic markers that jointly influence subgroups of highly correlated traits can be detected jointly with high sensitivity and specificity. While most of the traditional methods examined each phenotype independently and combined the results afterwards, our approach analyzes all of the traits jointly in a single statistical framework. This allows our method to borrow information across correlated phenotypes to discover the genetic markers that perturb a subset of the correlated traits synergistically. Using simulated datasets based on the HapMap consortium and an asthma dataset, we compared the performance of our method with other methods based on single-marker analysis and regression-based methods that do not use any of the relational information in the traits. We found that our method showed an increased power in detecting causal variants affecting correlated traits. Our results showed that, when correlation patterns among traits in a QTN are considered explicitly and directly during a structured multivariate genome association analysis using our proposed methods, the power of detecting true causal SNPs with possibly pleiotropic effects increased significantly without compromising performance on non-pleiotropic SNPs.

Show MeSH
Related in: MedlinePlus