Limits...
SPARCoC: a new framework for molecular pattern discovery and cancer gene identification.

Ma S, Johnson D, Ashby C, Xiong D, Cramer CL, Moore JH, Zhang S, Huang X - PLoS ONE (2015)

Bottom Line: Current clustering approaches have inherent limitations, which prevent them from gauging the subtle heterogeneity of the molecular subtypes.SPARCoC has clear advantages compared with widely-used alternative approaches: hierarchical clustering (Hclust) and nonnegative matrix factorization (NMF).We apply SPARCoC to the study of lung adenocarcinoma (ADCA), an extremely heterogeneous histological type, and a significant challenge for molecular subtyping.

View Article: PubMed Central - PubMed

Affiliation: Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T. Hong Kong.

ABSTRACT
It is challenging to cluster cancer patients of a certain histopathological type into molecular subtypes of clinical importance and identify gene signatures directly relevant to the subtypes. Current clustering approaches have inherent limitations, which prevent them from gauging the subtle heterogeneity of the molecular subtypes. In this paper we present a new framework: SPARCoC (Sparse-CoClust), which is based on a novel Common-background and Sparse-foreground Decomposition (CSD) model and the Maximum Block Improvement (MBI) co-clustering technique. SPARCoC has clear advantages compared with widely-used alternative approaches: hierarchical clustering (Hclust) and nonnegative matrix factorization (NMF). We apply SPARCoC to the study of lung adenocarcinoma (ADCA), an extremely heterogeneous histological type, and a significant challenge for molecular subtyping. For testing and verification, we use high quality gene expression profiling data of lung ADCA patients, and identify prognostic gene signatures which could cluster patients into subgroups that are significantly different in their overall survival (with p-values < 0.05). Our results are only based on gene expression profiling data analysis, without incorporating any other feature selection or clinical information; we are able to replicate our findings with completely independent datasets. SPARCoC is broadly applicable to large-scale genomic data to empower pattern discovery and cancer gene identification.

No MeSH data available.


Related in: MedlinePlus

Heat maps clearly show the patterns of MBI co-clusters. For the gene expression datasets studied here, MBI co-clustering simultaneously provide the gene (row) groupings and the sample (column) groupings, identifying the genes associated with the different types or subtypes. (a) Heat map shows clear co-clusters identified by MBI. The plot is based on real values of Y matrix of gene expression profiling data (data1 with three types: COID/20, CM/13, NL/17; refer to S1 File). Each row corresponds to one gene; each column corresponds to one sample. This heat map shows the expression values of 100 genes across all the 3 different types. (b) Heat map shows clear co-clusters identified by MBI. The plot is based on the values of Y matrix for Canada stage1 dataset (heat map for Canada stage1 dataset with 562 genes with k1 = 100 and k2 = 2. The two groups are separated by a thick black vertical line).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4359112&req=5

pone.0117135.g002: Heat maps clearly show the patterns of MBI co-clusters. For the gene expression datasets studied here, MBI co-clustering simultaneously provide the gene (row) groupings and the sample (column) groupings, identifying the genes associated with the different types or subtypes. (a) Heat map shows clear co-clusters identified by MBI. The plot is based on real values of Y matrix of gene expression profiling data (data1 with three types: COID/20, CM/13, NL/17; refer to S1 File). Each row corresponds to one gene; each column corresponds to one sample. This heat map shows the expression values of 100 genes across all the 3 different types. (b) Heat map shows clear co-clusters identified by MBI. The plot is based on the values of Y matrix for Canada stage1 dataset (heat map for Canada stage1 dataset with 562 genes with k1 = 100 and k2 = 2. The two groups are separated by a thick black vertical line).

Mentions: The MBI co-clustering module, as a checkerboard co-clustering approach, can generate both row grouping and column grouping at the same time, and thus help identify cancer genes (rows) defining the different molecular clusters/subgroups of patients (columns) (see Fig. 2).


SPARCoC: a new framework for molecular pattern discovery and cancer gene identification.

Ma S, Johnson D, Ashby C, Xiong D, Cramer CL, Moore JH, Zhang S, Huang X - PLoS ONE (2015)

Heat maps clearly show the patterns of MBI co-clusters. For the gene expression datasets studied here, MBI co-clustering simultaneously provide the gene (row) groupings and the sample (column) groupings, identifying the genes associated with the different types or subtypes. (a) Heat map shows clear co-clusters identified by MBI. The plot is based on real values of Y matrix of gene expression profiling data (data1 with three types: COID/20, CM/13, NL/17; refer to S1 File). Each row corresponds to one gene; each column corresponds to one sample. This heat map shows the expression values of 100 genes across all the 3 different types. (b) Heat map shows clear co-clusters identified by MBI. The plot is based on the values of Y matrix for Canada stage1 dataset (heat map for Canada stage1 dataset with 562 genes with k1 = 100 and k2 = 2. The two groups are separated by a thick black vertical line).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4359112&req=5

pone.0117135.g002: Heat maps clearly show the patterns of MBI co-clusters. For the gene expression datasets studied here, MBI co-clustering simultaneously provide the gene (row) groupings and the sample (column) groupings, identifying the genes associated with the different types or subtypes. (a) Heat map shows clear co-clusters identified by MBI. The plot is based on real values of Y matrix of gene expression profiling data (data1 with three types: COID/20, CM/13, NL/17; refer to S1 File). Each row corresponds to one gene; each column corresponds to one sample. This heat map shows the expression values of 100 genes across all the 3 different types. (b) Heat map shows clear co-clusters identified by MBI. The plot is based on the values of Y matrix for Canada stage1 dataset (heat map for Canada stage1 dataset with 562 genes with k1 = 100 and k2 = 2. The two groups are separated by a thick black vertical line).
Mentions: The MBI co-clustering module, as a checkerboard co-clustering approach, can generate both row grouping and column grouping at the same time, and thus help identify cancer genes (rows) defining the different molecular clusters/subgroups of patients (columns) (see Fig. 2).

Bottom Line: Current clustering approaches have inherent limitations, which prevent them from gauging the subtle heterogeneity of the molecular subtypes.SPARCoC has clear advantages compared with widely-used alternative approaches: hierarchical clustering (Hclust) and nonnegative matrix factorization (NMF).We apply SPARCoC to the study of lung adenocarcinoma (ADCA), an extremely heterogeneous histological type, and a significant challenge for molecular subtyping.

View Article: PubMed Central - PubMed

Affiliation: Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T. Hong Kong.

ABSTRACT
It is challenging to cluster cancer patients of a certain histopathological type into molecular subtypes of clinical importance and identify gene signatures directly relevant to the subtypes. Current clustering approaches have inherent limitations, which prevent them from gauging the subtle heterogeneity of the molecular subtypes. In this paper we present a new framework: SPARCoC (Sparse-CoClust), which is based on a novel Common-background and Sparse-foreground Decomposition (CSD) model and the Maximum Block Improvement (MBI) co-clustering technique. SPARCoC has clear advantages compared with widely-used alternative approaches: hierarchical clustering (Hclust) and nonnegative matrix factorization (NMF). We apply SPARCoC to the study of lung adenocarcinoma (ADCA), an extremely heterogeneous histological type, and a significant challenge for molecular subtyping. For testing and verification, we use high quality gene expression profiling data of lung ADCA patients, and identify prognostic gene signatures which could cluster patients into subgroups that are significantly different in their overall survival (with p-values < 0.05). Our results are only based on gene expression profiling data analysis, without incorporating any other feature selection or clinical information; we are able to replicate our findings with completely independent datasets. SPARCoC is broadly applicable to large-scale genomic data to empower pattern discovery and cancer gene identification.

No MeSH data available.


Related in: MedlinePlus