Limits...
Inferring pathway activity toward precise disease classification.

Lee E, Chuang HY, Kim JW, Ideker T, Lee D - PLoS Comput. Biol. (2008)

Bottom Line: For each pathway, an activity level is summarized from the gene expression levels of its condition-responsive genes (CORGs), defined as the subset of genes in the pathway whose combined expression delivers optimal discriminative power for the disease phenotype.Moreover, the new method outperforms several previous approaches that use a static (i.e., non-conditional) definition of pathways.Within a pathway, the identified CORGs may facilitate the development of better diagnostic markers and the discovery of core alterations in human disease.

View Article: PubMed Central - PubMed

Affiliation: Department of Bio and Brain Engineering, KAIST, Daejeon, South Korea.

ABSTRACT
The advent of microarray technology has made it possible to classify disease states based on gene expression profiles of patients. Typically, marker genes are selected by measuring the power of their expression profiles to discriminate among patients of different disease states. However, expression-based classification can be challenging in complex diseases due to factors such as cellular heterogeneity within a tissue sample and genetic heterogeneity across patients. A promising technique for coping with these challenges is to incorporate pathway information into the disease classification procedure in order to classify disease based on the activity of entire signaling pathways or protein complexes rather than on the expression levels of individual genes or proteins. We propose a new classification method based on pathway activities inferred for each patient. For each pathway, an activity level is summarized from the gene expression levels of its condition-responsive genes (CORGs), defined as the subset of genes in the pathway whose combined expression delivers optimal discriminative power for the disease phenotype. We show that classifiers using pathway activity achieve better performance than classifiers based on individual gene expression, for both simple and complex case-control studies including differentiation of perturbed from non-perturbed cells and subtyping of several different kinds of cancer. Moreover, the new method outperforms several previous approaches that use a static (i.e., non-conditional) definition of pathways. Within a pathway, the identified CORGs may facilitate the development of better diagnostic markers and the discovery of core alterations in human disease.

Show MeSH

Related in: MedlinePlus

Classification accuracy within (A) and across (B) datasets.Bar chart of Area Under ROC Curve (AUC) classification performance of CORG-based pathway markers (PAC), conventional pathway markers (Mean, Median, and PCA), and individual genes (Gene; same number of top discriminative genes as the number of CORGs in pathway markers). Classification performance is summarized as mean±ste of AUC over 100 runs of 5-fold cross-validation within a dataset. To compute PAC_random, the AUC values of 1000 sets of random gene sets were averaged. Numbers above the red bars are -log (p-value) from the Wilcoxon signed-rank test on the 500 AUCs of “PAC” against those of “Gene” (only the ones with p-value<0.05 are shown). The p-values measure the significance of difference between PAC and gene-based classification.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2563693&req=5

pcbi-1000217-g003: Classification accuracy within (A) and across (B) datasets.Bar chart of Area Under ROC Curve (AUC) classification performance of CORG-based pathway markers (PAC), conventional pathway markers (Mean, Median, and PCA), and individual genes (Gene; same number of top discriminative genes as the number of CORGs in pathway markers). Classification performance is summarized as mean±ste of AUC over 100 runs of 5-fold cross-validation within a dataset. To compute PAC_random, the AUC values of 1000 sets of random gene sets were averaged. Numbers above the red bars are -log (p-value) from the Wilcoxon signed-rank test on the 500 AUCs of “PAC” against those of “Gene” (only the ones with p-value<0.05 are shown). The p-values measure the significance of difference between PAC and gene-based classification.

Mentions: As shown in Figure 3A, our pathway-based classifiers (PAC) significantly outperformed the conventional gene-based classifiers (Gene). The improved performance was not simply due to grouping multiple gene expression measurements, as shown by comparing our performance with that of random groups of genes (PAC_random; averaged AUCs of 1000 sets of same-size random gene sets as the significant pathways). Classifiers using pathway activity inferred by the mean or median of the member gene expression [22] or the 1st principle component (PCA) [20] had higher predictive power than those using random gene sets (PAC_random), but only comparable power to the conventional gene-based classifiers. These results indicate that there are at least two critical factors in developing an advanced molecular diagnostic: (1) a biologically meaningful definition of pathways and (2) inference of condition-specific pathway activity.


Inferring pathway activity toward precise disease classification.

Lee E, Chuang HY, Kim JW, Ideker T, Lee D - PLoS Comput. Biol. (2008)

Classification accuracy within (A) and across (B) datasets.Bar chart of Area Under ROC Curve (AUC) classification performance of CORG-based pathway markers (PAC), conventional pathway markers (Mean, Median, and PCA), and individual genes (Gene; same number of top discriminative genes as the number of CORGs in pathway markers). Classification performance is summarized as mean±ste of AUC over 100 runs of 5-fold cross-validation within a dataset. To compute PAC_random, the AUC values of 1000 sets of random gene sets were averaged. Numbers above the red bars are -log (p-value) from the Wilcoxon signed-rank test on the 500 AUCs of “PAC” against those of “Gene” (only the ones with p-value<0.05 are shown). The p-values measure the significance of difference between PAC and gene-based classification.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2563693&req=5

pcbi-1000217-g003: Classification accuracy within (A) and across (B) datasets.Bar chart of Area Under ROC Curve (AUC) classification performance of CORG-based pathway markers (PAC), conventional pathway markers (Mean, Median, and PCA), and individual genes (Gene; same number of top discriminative genes as the number of CORGs in pathway markers). Classification performance is summarized as mean±ste of AUC over 100 runs of 5-fold cross-validation within a dataset. To compute PAC_random, the AUC values of 1000 sets of random gene sets were averaged. Numbers above the red bars are -log (p-value) from the Wilcoxon signed-rank test on the 500 AUCs of “PAC” against those of “Gene” (only the ones with p-value<0.05 are shown). The p-values measure the significance of difference between PAC and gene-based classification.
Mentions: As shown in Figure 3A, our pathway-based classifiers (PAC) significantly outperformed the conventional gene-based classifiers (Gene). The improved performance was not simply due to grouping multiple gene expression measurements, as shown by comparing our performance with that of random groups of genes (PAC_random; averaged AUCs of 1000 sets of same-size random gene sets as the significant pathways). Classifiers using pathway activity inferred by the mean or median of the member gene expression [22] or the 1st principle component (PCA) [20] had higher predictive power than those using random gene sets (PAC_random), but only comparable power to the conventional gene-based classifiers. These results indicate that there are at least two critical factors in developing an advanced molecular diagnostic: (1) a biologically meaningful definition of pathways and (2) inference of condition-specific pathway activity.

Bottom Line: For each pathway, an activity level is summarized from the gene expression levels of its condition-responsive genes (CORGs), defined as the subset of genes in the pathway whose combined expression delivers optimal discriminative power for the disease phenotype.Moreover, the new method outperforms several previous approaches that use a static (i.e., non-conditional) definition of pathways.Within a pathway, the identified CORGs may facilitate the development of better diagnostic markers and the discovery of core alterations in human disease.

View Article: PubMed Central - PubMed

Affiliation: Department of Bio and Brain Engineering, KAIST, Daejeon, South Korea.

ABSTRACT
The advent of microarray technology has made it possible to classify disease states based on gene expression profiles of patients. Typically, marker genes are selected by measuring the power of their expression profiles to discriminate among patients of different disease states. However, expression-based classification can be challenging in complex diseases due to factors such as cellular heterogeneity within a tissue sample and genetic heterogeneity across patients. A promising technique for coping with these challenges is to incorporate pathway information into the disease classification procedure in order to classify disease based on the activity of entire signaling pathways or protein complexes rather than on the expression levels of individual genes or proteins. We propose a new classification method based on pathway activities inferred for each patient. For each pathway, an activity level is summarized from the gene expression levels of its condition-responsive genes (CORGs), defined as the subset of genes in the pathway whose combined expression delivers optimal discriminative power for the disease phenotype. We show that classifiers using pathway activity achieve better performance than classifiers based on individual gene expression, for both simple and complex case-control studies including differentiation of perturbed from non-perturbed cells and subtyping of several different kinds of cancer. Moreover, the new method outperforms several previous approaches that use a static (i.e., non-conditional) definition of pathways. Within a pathway, the identified CORGs may facilitate the development of better diagnostic markers and the discovery of core alterations in human disease.

Show MeSH
Related in: MedlinePlus