Limits...
A pan-cancer proteomic perspective on The Cancer Genome Atlas.

Akbani R, Ng PK, Werner HM, Shahmoradgoli M, Zhang F, Ju Z, Liu W, Yang JY, Yoshihara K, Li J, Ling S, Seviour EG, Ram PT, Minna JD, Diao L, Tong P, Heymach JV, Hill SM, Dondelinger F, Städler N, Byers LA, Meric-Bernstam F, Weinstein JN, Broom BM, Verhaak RG, Liang H, Mukherjee S, Lu Y, Mills GB - Nat Commun (2014)

Bottom Line: Therefore, direct study of the functional proteome has the potential to provide a wealth of information that complements and extends genomic, epigenomic and transcriptomic analysis in The Cancer Genome Atlas (TCGA) projects.The resultant proteomic data are integrated with genomic and transcriptomic analyses of the same samples to identify commonalities, differences, emergent pathways and network biology within and across tumour lineages.In addition, tissue-specific signals are reduced computationally to enhance biomarker and target discovery spanning multiple tumour lineages.

View Article: PubMed Central - PubMed

Affiliation: 1] Department of Bioinformatics and Computational Biology, 1400 Pressler St., The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA [2].

ABSTRACT
Protein levels and function are poorly predicted by genomic and transcriptomic analysis of patient tumours. Therefore, direct study of the functional proteome has the potential to provide a wealth of information that complements and extends genomic, epigenomic and transcriptomic analysis in The Cancer Genome Atlas (TCGA) projects. Here we use reverse-phase protein arrays to analyse 3,467 patient samples from 11 TCGA 'Pan-Cancer' diseases, using 181 high-quality antibodies that target 128 total proteins and 53 post-translationally modified proteins. The resultant proteomic data are integrated with genomic and transcriptomic analyses of the same samples to identify commonalities, differences, emergent pathways and network biology within and across tumour lineages. In addition, tissue-specific signals are reduced computationally to enhance biomarker and target discovery spanning multiple tumour lineages. This integrative analysis, with an emphasis on pathways and potentially actionable proteins, provides a framework for determining the prognostic, predictive and therapeutic relevance of the functional proteome.

Show MeSH

Related in: MedlinePlus

Unsupervised clustering and analyses based on the MC dataseta Heatmap showing protein expression after unsupervised hierarchical clustering of 3,467 cancer samples across 11 tumor types and 181 antibodies. Protein levels are indicated on a low-to-high scale (blue-white-red). Seven clusters were defined. Cluster_II has been subdivided manually into two clusters (IIa and IIb) based on significant difference in expression of the proteins of interest (HER2 and EGFR). Annotation bars include tumor lineage (BRCA-basal separately indicated), purity and ploidy (ABSOLUTE algorithm); stromal and immune scores (ESTIMATE algorithm); BRCA (PAM50 classification) and BLCA subtype; 16 significantly mutated genes and two frequently observed amplifications. Statistical significance of the correlations between the clusters and each variable is indicated left of the annotation bars (n=3,467, chi-squared, Fisher’s Exact, and ANOVA’s F test. See Methods).b Crosstab showing the number of tumor samples in each cluster.c-g Kaplan Meier curves showing overall survival in (c) the KIRC in cluster_VII vs. in all other clusters (n=454), (d) OVCA in cluster_VII vs. in all other clusters (n=412), (e) KIRC in cluster_IV vs. in all other clusters (n=454), (f) LUSC in cluster_V vs. in all other clusters (n=195) and (g) COAD in cluster_V vs. in all other clusters (n=334). Follow-up has been capped at 60 months months, due to limited number of events beyond this time. Statistical difference in outcome between groups is indicated by P-value (log-rank test). A high-resolution, interactive version of the heatmap with zooming capability, can be found at (http://bioinformatics.mdanderson.org/main/TCGA/Pancan11/RPPA).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4109726&req=5

Figure 3: Unsupervised clustering and analyses based on the MC dataseta Heatmap showing protein expression after unsupervised hierarchical clustering of 3,467 cancer samples across 11 tumor types and 181 antibodies. Protein levels are indicated on a low-to-high scale (blue-white-red). Seven clusters were defined. Cluster_II has been subdivided manually into two clusters (IIa and IIb) based on significant difference in expression of the proteins of interest (HER2 and EGFR). Annotation bars include tumor lineage (BRCA-basal separately indicated), purity and ploidy (ABSOLUTE algorithm); stromal and immune scores (ESTIMATE algorithm); BRCA (PAM50 classification) and BLCA subtype; 16 significantly mutated genes and two frequently observed amplifications. Statistical significance of the correlations between the clusters and each variable is indicated left of the annotation bars (n=3,467, chi-squared, Fisher’s Exact, and ANOVA’s F test. See Methods).b Crosstab showing the number of tumor samples in each cluster.c-g Kaplan Meier curves showing overall survival in (c) the KIRC in cluster_VII vs. in all other clusters (n=454), (d) OVCA in cluster_VII vs. in all other clusters (n=412), (e) KIRC in cluster_IV vs. in all other clusters (n=454), (f) LUSC in cluster_V vs. in all other clusters (n=195) and (g) COAD in cluster_V vs. in all other clusters (n=334). Follow-up has been capped at 60 months months, due to limited number of events beyond this time. Statistical difference in outcome between groups is indicated by P-value (log-rank test). A high-resolution, interactive version of the heatmap with zooming capability, can be found at (http://bioinformatics.mdanderson.org/main/TCGA/Pancan11/RPPA).

Mentions: Tumor lineage represents the dominant determinant of protein clustering using the RBN approach (Fig. 2). We, therefore, investigated whether further transforming the RBN data to reduce tissue signatures by median centering within tissue types (MC, see Methods) would identify clinically or biologically relevant protein patterns that span multiple tumor lineages (Fig. 3a). Using MC, we obtained 7 clusters (I-VII) that were no longer strongly correlated with tumor lineage, as evident from the top annotation bar in Fig. 3a (Supplementary Fig. 4), and from the tissue vs. cluster cross-tabulation (Fig. 3b). This allowed exploration of molecular events that spanned multiple tissues, which was not possible with the RBN approach. Supplementary Table 8 shows a contingency table the distribution of samples across RBN vs. MC clusters, highlighting the differences between the clusters. Supplementary Tables 9-12 show the top 25 proteins, mRNAs, miRNAs, and mutations that discriminated different MC clusters (full table available at http://bioinformatics.mdanderson.org/main/TCGA/Pancan11/RPPA).


A pan-cancer proteomic perspective on The Cancer Genome Atlas.

Akbani R, Ng PK, Werner HM, Shahmoradgoli M, Zhang F, Ju Z, Liu W, Yang JY, Yoshihara K, Li J, Ling S, Seviour EG, Ram PT, Minna JD, Diao L, Tong P, Heymach JV, Hill SM, Dondelinger F, Städler N, Byers LA, Meric-Bernstam F, Weinstein JN, Broom BM, Verhaak RG, Liang H, Mukherjee S, Lu Y, Mills GB - Nat Commun (2014)

Unsupervised clustering and analyses based on the MC dataseta Heatmap showing protein expression after unsupervised hierarchical clustering of 3,467 cancer samples across 11 tumor types and 181 antibodies. Protein levels are indicated on a low-to-high scale (blue-white-red). Seven clusters were defined. Cluster_II has been subdivided manually into two clusters (IIa and IIb) based on significant difference in expression of the proteins of interest (HER2 and EGFR). Annotation bars include tumor lineage (BRCA-basal separately indicated), purity and ploidy (ABSOLUTE algorithm); stromal and immune scores (ESTIMATE algorithm); BRCA (PAM50 classification) and BLCA subtype; 16 significantly mutated genes and two frequently observed amplifications. Statistical significance of the correlations between the clusters and each variable is indicated left of the annotation bars (n=3,467, chi-squared, Fisher’s Exact, and ANOVA’s F test. See Methods).b Crosstab showing the number of tumor samples in each cluster.c-g Kaplan Meier curves showing overall survival in (c) the KIRC in cluster_VII vs. in all other clusters (n=454), (d) OVCA in cluster_VII vs. in all other clusters (n=412), (e) KIRC in cluster_IV vs. in all other clusters (n=454), (f) LUSC in cluster_V vs. in all other clusters (n=195) and (g) COAD in cluster_V vs. in all other clusters (n=334). Follow-up has been capped at 60 months months, due to limited number of events beyond this time. Statistical difference in outcome between groups is indicated by P-value (log-rank test). A high-resolution, interactive version of the heatmap with zooming capability, can be found at (http://bioinformatics.mdanderson.org/main/TCGA/Pancan11/RPPA).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4109726&req=5

Figure 3: Unsupervised clustering and analyses based on the MC dataseta Heatmap showing protein expression after unsupervised hierarchical clustering of 3,467 cancer samples across 11 tumor types and 181 antibodies. Protein levels are indicated on a low-to-high scale (blue-white-red). Seven clusters were defined. Cluster_II has been subdivided manually into two clusters (IIa and IIb) based on significant difference in expression of the proteins of interest (HER2 and EGFR). Annotation bars include tumor lineage (BRCA-basal separately indicated), purity and ploidy (ABSOLUTE algorithm); stromal and immune scores (ESTIMATE algorithm); BRCA (PAM50 classification) and BLCA subtype; 16 significantly mutated genes and two frequently observed amplifications. Statistical significance of the correlations between the clusters and each variable is indicated left of the annotation bars (n=3,467, chi-squared, Fisher’s Exact, and ANOVA’s F test. See Methods).b Crosstab showing the number of tumor samples in each cluster.c-g Kaplan Meier curves showing overall survival in (c) the KIRC in cluster_VII vs. in all other clusters (n=454), (d) OVCA in cluster_VII vs. in all other clusters (n=412), (e) KIRC in cluster_IV vs. in all other clusters (n=454), (f) LUSC in cluster_V vs. in all other clusters (n=195) and (g) COAD in cluster_V vs. in all other clusters (n=334). Follow-up has been capped at 60 months months, due to limited number of events beyond this time. Statistical difference in outcome between groups is indicated by P-value (log-rank test). A high-resolution, interactive version of the heatmap with zooming capability, can be found at (http://bioinformatics.mdanderson.org/main/TCGA/Pancan11/RPPA).
Mentions: Tumor lineage represents the dominant determinant of protein clustering using the RBN approach (Fig. 2). We, therefore, investigated whether further transforming the RBN data to reduce tissue signatures by median centering within tissue types (MC, see Methods) would identify clinically or biologically relevant protein patterns that span multiple tumor lineages (Fig. 3a). Using MC, we obtained 7 clusters (I-VII) that were no longer strongly correlated with tumor lineage, as evident from the top annotation bar in Fig. 3a (Supplementary Fig. 4), and from the tissue vs. cluster cross-tabulation (Fig. 3b). This allowed exploration of molecular events that spanned multiple tissues, which was not possible with the RBN approach. Supplementary Table 8 shows a contingency table the distribution of samples across RBN vs. MC clusters, highlighting the differences between the clusters. Supplementary Tables 9-12 show the top 25 proteins, mRNAs, miRNAs, and mutations that discriminated different MC clusters (full table available at http://bioinformatics.mdanderson.org/main/TCGA/Pancan11/RPPA).

Bottom Line: Therefore, direct study of the functional proteome has the potential to provide a wealth of information that complements and extends genomic, epigenomic and transcriptomic analysis in The Cancer Genome Atlas (TCGA) projects.The resultant proteomic data are integrated with genomic and transcriptomic analyses of the same samples to identify commonalities, differences, emergent pathways and network biology within and across tumour lineages.In addition, tissue-specific signals are reduced computationally to enhance biomarker and target discovery spanning multiple tumour lineages.

View Article: PubMed Central - PubMed

Affiliation: 1] Department of Bioinformatics and Computational Biology, 1400 Pressler St., The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA [2].

ABSTRACT
Protein levels and function are poorly predicted by genomic and transcriptomic analysis of patient tumours. Therefore, direct study of the functional proteome has the potential to provide a wealth of information that complements and extends genomic, epigenomic and transcriptomic analysis in The Cancer Genome Atlas (TCGA) projects. Here we use reverse-phase protein arrays to analyse 3,467 patient samples from 11 TCGA 'Pan-Cancer' diseases, using 181 high-quality antibodies that target 128 total proteins and 53 post-translationally modified proteins. The resultant proteomic data are integrated with genomic and transcriptomic analyses of the same samples to identify commonalities, differences, emergent pathways and network biology within and across tumour lineages. In addition, tissue-specific signals are reduced computationally to enhance biomarker and target discovery spanning multiple tumour lineages. This integrative analysis, with an emphasis on pathways and potentially actionable proteins, provides a framework for determining the prognostic, predictive and therapeutic relevance of the functional proteome.

Show MeSH
Related in: MedlinePlus