Limits...
Inferring pathway activity toward precise disease classification.

Lee E, Chuang HY, Kim JW, Ideker T, Lee D - PLoS Comput. Biol. (2008)

Bottom Line: For each pathway, an activity level is summarized from the gene expression levels of its condition-responsive genes (CORGs), defined as the subset of genes in the pathway whose combined expression delivers optimal discriminative power for the disease phenotype.Moreover, the new method outperforms several previous approaches that use a static (i.e., non-conditional) definition of pathways.Within a pathway, the identified CORGs may facilitate the development of better diagnostic markers and the discovery of core alterations in human disease.

View Article: PubMed Central - PubMed

Affiliation: Department of Bio and Brain Engineering, KAIST, Daejeon, South Korea.

ABSTRACT
The advent of microarray technology has made it possible to classify disease states based on gene expression profiles of patients. Typically, marker genes are selected by measuring the power of their expression profiles to discriminate among patients of different disease states. However, expression-based classification can be challenging in complex diseases due to factors such as cellular heterogeneity within a tissue sample and genetic heterogeneity across patients. A promising technique for coping with these challenges is to incorporate pathway information into the disease classification procedure in order to classify disease based on the activity of entire signaling pathways or protein complexes rather than on the expression levels of individual genes or proteins. We propose a new classification method based on pathway activities inferred for each patient. For each pathway, an activity level is summarized from the gene expression levels of its condition-responsive genes (CORGs), defined as the subset of genes in the pathway whose combined expression delivers optimal discriminative power for the disease phenotype. We show that classifiers using pathway activity achieve better performance than classifiers based on individual gene expression, for both simple and complex case-control studies including differentiation of perturbed from non-perturbed cells and subtyping of several different kinds of cancer. Moreover, the new method outperforms several previous approaches that use a static (i.e., non-conditional) definition of pathways. Within a pathway, the identified CORGs may facilitate the development of better diagnostic markers and the discovery of core alterations in human disease.

Show MeSH

Related in: MedlinePlus

A schematic diagram of key gene identification and activity inference.Selected significant pathways are further subject to CORG identification corresponding to the phenotype of interest. Gene expression profiles of patient samples drawn from each subtype of diseases (e.g., good or poor prognosis) are transformed into a “pathway activity matrix”. For a given pathway, the activity is a combined z-score derived from the expression of its individual key genes. After overlaying the expression vector of each gene on its corresponding protein in the pathway, key genes which yield most discriminative activities are found via a greedy search based on their individual power (see Methods). The pathway activity matrix is then used to train a classifier.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2563693&req=5

pcbi-1000217-g001: A schematic diagram of key gene identification and activity inference.Selected significant pathways are further subject to CORG identification corresponding to the phenotype of interest. Gene expression profiles of patient samples drawn from each subtype of diseases (e.g., good or poor prognosis) are transformed into a “pathway activity matrix”. For a given pathway, the activity is a combined z-score derived from the expression of its individual key genes. After overlaying the expression vector of each gene on its corresponding protein in the pathway, key genes which yield most discriminative activities are found via a greedy search based on their individual power (see Methods). The pathway activity matrix is then used to train a classifier.

Mentions: To integrate the expression and pathway datasets, we overlaid the expression values of each gene on its corresponding protein in each pathway. Within each pathway, we searched for a subset of member genes whose combined expression levels across the samples were highly discriminative of the phenotypes of interest (Figure 1). For a particular gene set G, let a represent its vector of activity scores over the samples in a study, and let c represent the corresponding vector of class labels (e.g. good vs. poor prognosis). To derive a, expression values gij are normalized to z-transformed scores zij which for each gene i have mean μi = 0 and standard deviation σi = 1 over all samples j. The individual zij of each member gene in the gene set are averaged into a combined z-score which is designated the activity aj (the square root of the number of member genes is used in the denominator to stabilize the variance of the mean). Many types of statistic, such as the Wilcoxon score or Pearson correlation, could be used to score the relationship between a and c. In this study, we defined the discriminative score S(G) as the t-test statistic [32] derived on a between groups of samples defined by c.


Inferring pathway activity toward precise disease classification.

Lee E, Chuang HY, Kim JW, Ideker T, Lee D - PLoS Comput. Biol. (2008)

A schematic diagram of key gene identification and activity inference.Selected significant pathways are further subject to CORG identification corresponding to the phenotype of interest. Gene expression profiles of patient samples drawn from each subtype of diseases (e.g., good or poor prognosis) are transformed into a “pathway activity matrix”. For a given pathway, the activity is a combined z-score derived from the expression of its individual key genes. After overlaying the expression vector of each gene on its corresponding protein in the pathway, key genes which yield most discriminative activities are found via a greedy search based on their individual power (see Methods). The pathway activity matrix is then used to train a classifier.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2563693&req=5

pcbi-1000217-g001: A schematic diagram of key gene identification and activity inference.Selected significant pathways are further subject to CORG identification corresponding to the phenotype of interest. Gene expression profiles of patient samples drawn from each subtype of diseases (e.g., good or poor prognosis) are transformed into a “pathway activity matrix”. For a given pathway, the activity is a combined z-score derived from the expression of its individual key genes. After overlaying the expression vector of each gene on its corresponding protein in the pathway, key genes which yield most discriminative activities are found via a greedy search based on their individual power (see Methods). The pathway activity matrix is then used to train a classifier.
Mentions: To integrate the expression and pathway datasets, we overlaid the expression values of each gene on its corresponding protein in each pathway. Within each pathway, we searched for a subset of member genes whose combined expression levels across the samples were highly discriminative of the phenotypes of interest (Figure 1). For a particular gene set G, let a represent its vector of activity scores over the samples in a study, and let c represent the corresponding vector of class labels (e.g. good vs. poor prognosis). To derive a, expression values gij are normalized to z-transformed scores zij which for each gene i have mean μi = 0 and standard deviation σi = 1 over all samples j. The individual zij of each member gene in the gene set are averaged into a combined z-score which is designated the activity aj (the square root of the number of member genes is used in the denominator to stabilize the variance of the mean). Many types of statistic, such as the Wilcoxon score or Pearson correlation, could be used to score the relationship between a and c. In this study, we defined the discriminative score S(G) as the t-test statistic [32] derived on a between groups of samples defined by c.

Bottom Line: For each pathway, an activity level is summarized from the gene expression levels of its condition-responsive genes (CORGs), defined as the subset of genes in the pathway whose combined expression delivers optimal discriminative power for the disease phenotype.Moreover, the new method outperforms several previous approaches that use a static (i.e., non-conditional) definition of pathways.Within a pathway, the identified CORGs may facilitate the development of better diagnostic markers and the discovery of core alterations in human disease.

View Article: PubMed Central - PubMed

Affiliation: Department of Bio and Brain Engineering, KAIST, Daejeon, South Korea.

ABSTRACT
The advent of microarray technology has made it possible to classify disease states based on gene expression profiles of patients. Typically, marker genes are selected by measuring the power of their expression profiles to discriminate among patients of different disease states. However, expression-based classification can be challenging in complex diseases due to factors such as cellular heterogeneity within a tissue sample and genetic heterogeneity across patients. A promising technique for coping with these challenges is to incorporate pathway information into the disease classification procedure in order to classify disease based on the activity of entire signaling pathways or protein complexes rather than on the expression levels of individual genes or proteins. We propose a new classification method based on pathway activities inferred for each patient. For each pathway, an activity level is summarized from the gene expression levels of its condition-responsive genes (CORGs), defined as the subset of genes in the pathway whose combined expression delivers optimal discriminative power for the disease phenotype. We show that classifiers using pathway activity achieve better performance than classifiers based on individual gene expression, for both simple and complex case-control studies including differentiation of perturbed from non-perturbed cells and subtyping of several different kinds of cancer. Moreover, the new method outperforms several previous approaches that use a static (i.e., non-conditional) definition of pathways. Within a pathway, the identified CORGs may facilitate the development of better diagnostic markers and the discovery of core alterations in human disease.

Show MeSH
Related in: MedlinePlus