Limits...
Statistical estimation of correlated genome associations to a quantitative trait network.

Kim S, Xing EP - PLoS Genet. (2009)

Bottom Line: Using simulated datasets based on the HapMap consortium and an asthma dataset, we compared the performance of our method with other methods based on single-marker analysis and regression-based methods that do not use any of the relational information in the traits.We found that our method showed an increased power in detecting causal variants affecting correlated traits.Our results showed that, when correlation patterns among traits in a QTN are considered explicitly and directly during a structured multivariate genome association analysis using our proposed methods, the power of detecting true causal SNPs with possibly pleiotropic effects increased significantly without compromising performance on non-pleiotropic SNPs.

View Article: PubMed Central - PubMed

Affiliation: School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.

ABSTRACT
Many complex disease syndromes, such as asthma, consist of a large number of highly related, rather than independent, clinical or molecular phenotypes. This raises a new technical challenge in identifying genetic variations associated simultaneously with correlated traits. In this study, we propose a new statistical framework called graph-guided fused lasso (GFlasso) to directly and effectively incorporate the correlation structure of multiple quantitative traits such as clinical metrics and gene expressions in association analysis. Our approach represents correlation information explicitly among the quantitative traits as a quantitative trait network (QTN) and then leverages this network to encode structured regularization functions in a multivariate regression model over the genotypes and traits. The result is that the genetic markers that jointly influence subgroups of highly correlated traits can be detected jointly with high sensitivity and specificity. While most of the traditional methods examined each phenotype independently and combined the results afterwards, our approach analyzes all of the traits jointly in a single statistical framework. This allows our method to borrow information across correlated phenotypes to discover the genetic markers that perturb a subset of the correlated traits synergistically. Using simulated datasets based on the HapMap consortium and an asthma dataset, we compared the performance of our method with other methods based on single-marker analysis and regression-based methods that do not use any of the relational information in the traits. We found that our method showed an increased power in detecting causal variants affecting correlated traits. Our results showed that, when correlation patterns among traits in a QTN are considered explicitly and directly during a structured multivariate genome association analysis using our proposed methods, the power of detecting true causal SNPs with possibly pleiotropic effects increased significantly without compromising performance on non-pleiotropic SNPs.

Show MeSH

Related in: MedlinePlus

ROC curves comparing the performance of association analysis methods when the sample size varies.Panels show (A) , (B) , (C) , (D) , and (E) . The association strength was 0.5, and the threshold  for producing the QTN was set to 0.3. The results were averaged over 50 simulated datasets. The ROC curves for , , and  almost entirely overlap.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2719086&req=5

pgen-1000587-g005: ROC curves comparing the performance of association analysis methods when the sample size varies.Panels show (A) , (B) , (C) , (D) , and (E) . The association strength was 0.5, and the threshold for producing the QTN was set to 0.3. The results were averaged over 50 simulated datasets. The ROC curves for , , and almost entirely overlap.

Mentions: First, we varied the sample size of the dataset to see how the sample size affects the performance of the different methods for association analysis. We used datasets of sizes 50, 100, 150, 200, and 250, with association strength fixed at 0.5 for all associated SNP-trait pairs (Case 1), and we set the threshold for trait correlations to be 0.3 to learn the QTN. The results are summarized in Figure 5, where the ROC curves were averaged over 50 datasets. The results confirmed that the lasso-based methods such as lasso and GFlasso methods are more successful in identifying true associations than the other methods. In addition, it can be seen that the ROC curves for , , and almost entirely overlap, whereas other methods are significantly inferior. We found that across all sample sizes, including a graph-guided fusion penalty as in GFlasso to take advantage of the correlation structure in traits can significantly increase the power for detecting true associations while reducing false positives, compared to lasso and other methods.


Statistical estimation of correlated genome associations to a quantitative trait network.

Kim S, Xing EP - PLoS Genet. (2009)

ROC curves comparing the performance of association analysis methods when the sample size varies.Panels show (A) , (B) , (C) , (D) , and (E) . The association strength was 0.5, and the threshold  for producing the QTN was set to 0.3. The results were averaged over 50 simulated datasets. The ROC curves for , , and  almost entirely overlap.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2719086&req=5

pgen-1000587-g005: ROC curves comparing the performance of association analysis methods when the sample size varies.Panels show (A) , (B) , (C) , (D) , and (E) . The association strength was 0.5, and the threshold for producing the QTN was set to 0.3. The results were averaged over 50 simulated datasets. The ROC curves for , , and almost entirely overlap.
Mentions: First, we varied the sample size of the dataset to see how the sample size affects the performance of the different methods for association analysis. We used datasets of sizes 50, 100, 150, 200, and 250, with association strength fixed at 0.5 for all associated SNP-trait pairs (Case 1), and we set the threshold for trait correlations to be 0.3 to learn the QTN. The results are summarized in Figure 5, where the ROC curves were averaged over 50 datasets. The results confirmed that the lasso-based methods such as lasso and GFlasso methods are more successful in identifying true associations than the other methods. In addition, it can be seen that the ROC curves for , , and almost entirely overlap, whereas other methods are significantly inferior. We found that across all sample sizes, including a graph-guided fusion penalty as in GFlasso to take advantage of the correlation structure in traits can significantly increase the power for detecting true associations while reducing false positives, compared to lasso and other methods.

Bottom Line: Using simulated datasets based on the HapMap consortium and an asthma dataset, we compared the performance of our method with other methods based on single-marker analysis and regression-based methods that do not use any of the relational information in the traits.We found that our method showed an increased power in detecting causal variants affecting correlated traits.Our results showed that, when correlation patterns among traits in a QTN are considered explicitly and directly during a structured multivariate genome association analysis using our proposed methods, the power of detecting true causal SNPs with possibly pleiotropic effects increased significantly without compromising performance on non-pleiotropic SNPs.

View Article: PubMed Central - PubMed

Affiliation: School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.

ABSTRACT
Many complex disease syndromes, such as asthma, consist of a large number of highly related, rather than independent, clinical or molecular phenotypes. This raises a new technical challenge in identifying genetic variations associated simultaneously with correlated traits. In this study, we propose a new statistical framework called graph-guided fused lasso (GFlasso) to directly and effectively incorporate the correlation structure of multiple quantitative traits such as clinical metrics and gene expressions in association analysis. Our approach represents correlation information explicitly among the quantitative traits as a quantitative trait network (QTN) and then leverages this network to encode structured regularization functions in a multivariate regression model over the genotypes and traits. The result is that the genetic markers that jointly influence subgroups of highly correlated traits can be detected jointly with high sensitivity and specificity. While most of the traditional methods examined each phenotype independently and combined the results afterwards, our approach analyzes all of the traits jointly in a single statistical framework. This allows our method to borrow information across correlated phenotypes to discover the genetic markers that perturb a subset of the correlated traits synergistically. Using simulated datasets based on the HapMap consortium and an asthma dataset, we compared the performance of our method with other methods based on single-marker analysis and regression-based methods that do not use any of the relational information in the traits. We found that our method showed an increased power in detecting causal variants affecting correlated traits. Our results showed that, when correlation patterns among traits in a QTN are considered explicitly and directly during a structured multivariate genome association analysis using our proposed methods, the power of detecting true causal SNPs with possibly pleiotropic effects increased significantly without compromising performance on non-pleiotropic SNPs.

Show MeSH
Related in: MedlinePlus