Limits...
Response projected clustering for direct association with physiological and clinical response data.

Yi SG, Park T, Lee JK - BMC Bioinformatics (2008)

Bottom Line: We have shown that RPC can effectively discover gene networks with different degrees of association with clinical metadata.Performed on each gene's response projected vector based on its degree of association with the response data, RPC effectively summarizes individual genes' association with metadata as well as their own expression patterns.Thus, RPC greatly enhances the utility of clustering analysis on investigating high-dimensional microarray gene expression data with quantitative metadata.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Statistics, Seoul National University, Silim-dong, Kwanak-gu, Seoul, 151-747, Korea. skon@bibs.snu.ac.kr

ABSTRACT

Background: Microarray gene expression data are often analyzed together with corresponding physiological response and clinical metadata of biological subjects, e.g. patients' residual tumor sizes after chemotherapy or glucose levels at various stages of diabetic patients. Current clustering analysis cannot directly incorporate such quantitative metadata into the clustering heatmap of gene expression. It will be quite useful if these clinical response data can be effectively summarized in the high-dimensional clustering display so that important groups of genes can be intuitively discovered with different degrees of relevance to target disease phenotypes.

Results: We introduced a novel clustering analysis approach, response projected clustering (RPC), which uses a high-dimensional geometrical projection of response data to the gene expression space. The projected response vector, which becomes the origin in the projected space, is then clustered together with the projected gene vectors based on their different degrees of association with the response vector. A bootstrap-counting based RPC analysis is also performed to evaluate statistical tightness of identified gene clusters. Our RPC analysis was applied to the in vitro growth-inhibition and microarray profiling data on the NCI-60 cancer cell lines and the microarray gene expression study of macrophage differentiation in atherogenesis. These RPC applications enabled us to identify many known and novel gene factors and their potential pathway associations which are highly relevant to the drug's chemosensitivity activities and atherogenesis.

Conclusion: We have shown that RPC can effectively discover gene networks with different degrees of association with clinical metadata. Performed on each gene's response projected vector based on its degree of association with the response data, RPC effectively summarizes individual genes' association with metadata as well as their own expression patterns. Thus, RPC greatly enhances the utility of clustering analysis on investigating high-dimensional microarray gene expression data with quantitative metadata.

Show MeSH

Related in: MedlinePlus

RPC analysis for PPARγ on macroarray data during macrophage differentiation to foam cell. (a) standard hierarchical clustering, and (b) RPC analysis. Genes are colored based on their known relevance in LDL (blue), OxLDL (red), mmLDL (turquoise), and macrophage (MΦ, pink) mechanisms.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2275250&req=5

Figure 6: RPC analysis for PPARγ on macroarray data during macrophage differentiation to foam cell. (a) standard hierarchical clustering, and (b) RPC analysis. Genes are colored based on their known relevance in LDL (blue), OxLDL (red), mmLDL (turquoise), and macrophage (MΦ, pink) mechanisms.

Mentions: Thus, we applied our RPC approach to the macrophage differentiation microarray data as if the gene expression values of PPARγ were response data in order to find the gene networks closely associated with this gene factor (Fig. 6). In order to remove random genes clustered with other biologically relevant genes, we preselected genes based on the significance of their differential expression among different LDL conditions with FDR < 0.05 [12]. The standard clustering analysis led to gene clusters with PPARγ based simply on each gene's correlation with other genes or PPARγ's correlation with genes (Fig. 6a). Many lowly-correlated genes with PPARγ, e.g., FEZ2 (r = 0.06), TPT1 (r = 0.19) are closely clustered with it whereas highly negatively-correlated genes, e.g. INSIG1 (r = -0.89) and CCL1 (r = -0.84) are found further away from it. On the contrary, in the RPC analysis, many genes highly correlated with PPARγ such as apoE, LPL, CD36, MT1, and IL1B are tightly clustered by themselves and closely clustered with it (Fig. 6b). PPARγ is also closely clustered with P8, PPARβ, and ABCG1 which are well-known for their roles in atherosclerosis. Lowly-correlated genes are assigned away from PPARγ gradually in this RPC analysis, and both positively and negatively highly-correlated genes are closely clustered with this gene despite their opposite expression directions.


Response projected clustering for direct association with physiological and clinical response data.

Yi SG, Park T, Lee JK - BMC Bioinformatics (2008)

RPC analysis for PPARγ on macroarray data during macrophage differentiation to foam cell. (a) standard hierarchical clustering, and (b) RPC analysis. Genes are colored based on their known relevance in LDL (blue), OxLDL (red), mmLDL (turquoise), and macrophage (MΦ, pink) mechanisms.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2275250&req=5

Figure 6: RPC analysis for PPARγ on macroarray data during macrophage differentiation to foam cell. (a) standard hierarchical clustering, and (b) RPC analysis. Genes are colored based on their known relevance in LDL (blue), OxLDL (red), mmLDL (turquoise), and macrophage (MΦ, pink) mechanisms.
Mentions: Thus, we applied our RPC approach to the macrophage differentiation microarray data as if the gene expression values of PPARγ were response data in order to find the gene networks closely associated with this gene factor (Fig. 6). In order to remove random genes clustered with other biologically relevant genes, we preselected genes based on the significance of their differential expression among different LDL conditions with FDR < 0.05 [12]. The standard clustering analysis led to gene clusters with PPARγ based simply on each gene's correlation with other genes or PPARγ's correlation with genes (Fig. 6a). Many lowly-correlated genes with PPARγ, e.g., FEZ2 (r = 0.06), TPT1 (r = 0.19) are closely clustered with it whereas highly negatively-correlated genes, e.g. INSIG1 (r = -0.89) and CCL1 (r = -0.84) are found further away from it. On the contrary, in the RPC analysis, many genes highly correlated with PPARγ such as apoE, LPL, CD36, MT1, and IL1B are tightly clustered by themselves and closely clustered with it (Fig. 6b). PPARγ is also closely clustered with P8, PPARβ, and ABCG1 which are well-known for their roles in atherosclerosis. Lowly-correlated genes are assigned away from PPARγ gradually in this RPC analysis, and both positively and negatively highly-correlated genes are closely clustered with this gene despite their opposite expression directions.

Bottom Line: We have shown that RPC can effectively discover gene networks with different degrees of association with clinical metadata.Performed on each gene's response projected vector based on its degree of association with the response data, RPC effectively summarizes individual genes' association with metadata as well as their own expression patterns.Thus, RPC greatly enhances the utility of clustering analysis on investigating high-dimensional microarray gene expression data with quantitative metadata.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Statistics, Seoul National University, Silim-dong, Kwanak-gu, Seoul, 151-747, Korea. skon@bibs.snu.ac.kr

ABSTRACT

Background: Microarray gene expression data are often analyzed together with corresponding physiological response and clinical metadata of biological subjects, e.g. patients' residual tumor sizes after chemotherapy or glucose levels at various stages of diabetic patients. Current clustering analysis cannot directly incorporate such quantitative metadata into the clustering heatmap of gene expression. It will be quite useful if these clinical response data can be effectively summarized in the high-dimensional clustering display so that important groups of genes can be intuitively discovered with different degrees of relevance to target disease phenotypes.

Results: We introduced a novel clustering analysis approach, response projected clustering (RPC), which uses a high-dimensional geometrical projection of response data to the gene expression space. The projected response vector, which becomes the origin in the projected space, is then clustered together with the projected gene vectors based on their different degrees of association with the response vector. A bootstrap-counting based RPC analysis is also performed to evaluate statistical tightness of identified gene clusters. Our RPC analysis was applied to the in vitro growth-inhibition and microarray profiling data on the NCI-60 cancer cell lines and the microarray gene expression study of macrophage differentiation in atherogenesis. These RPC applications enabled us to identify many known and novel gene factors and their potential pathway associations which are highly relevant to the drug's chemosensitivity activities and atherogenesis.

Conclusion: We have shown that RPC can effectively discover gene networks with different degrees of association with clinical metadata. Performed on each gene's response projected vector based on its degree of association with the response data, RPC effectively summarizes individual genes' association with metadata as well as their own expression patterns. Thus, RPC greatly enhances the utility of clustering analysis on investigating high-dimensional microarray gene expression data with quantitative metadata.

Show MeSH
Related in: MedlinePlus