Limits...
Proteomics signature profiling (PSP): a novel contextualization approach for cancer proteomics.

Goh WW, Lee YH, Ramdzan ZM, Sergot MJ, Chung M, Wong L - J. Proteome Res. (2012)

Bottom Line: Comparing our results against the Proteomics Expansion Pipeline (PEP) on which the same patient data was analyzed, we found good correlation.Building on this finding, we report significantly more clusters (176 clusters here compared to 70 in PEP), demonstrating the sensitivity of this approach.Although consistency of individual proteins between patients is low, we found the reported proteins tend to hit clusters in a meaningful and informative manner.

View Article: PubMed Central - PubMed

Affiliation: Department of Computing, Imperial College London , London, United Kingdom.

ABSTRACT
Traditional proteomics analysis is plagued by the use of arbitrary thresholds resulting in large loss of information. We propose here a novel method in proteomics that utilizes all detected proteins. We demonstrate its efficacy in a proteomics screen of 5 and 7 liver cancer patients in the moderate and late stage, respectively. Utilizing biological complexes as a cluster vector, and augmenting it with submodules obtained from partitioning an integrated and cleaned protein-protein interaction network, we calculate a Proteomics Signature Profile (PSP) for each patient based on the hit rates of their reported proteins, in the absence of fold change thresholds, against the cluster vector. Using this, we demonstrated that moderate- and late-stage patients segregate with high confidence. We also discovered a moderate-stage patient who displayed a proteomics profile similar to other poor-stage patients. We identified significant clusters using a modified version of the SNet approach. Comparing our results against the Proteomics Expansion Pipeline (PEP) on which the same patient data was analyzed, we found good correlation. Building on this finding, we report significantly more clusters (176 clusters here compared to 70 in PEP), demonstrating the sensitivity of this approach. Gene Ontology (GO) terms analysis also reveals that the significant clusters are functionally congruent with the liver cancer phenotype. PSP is a powerful and sensitive method for analyzing proteomics profiles even when sample sizes are small. It does not rely on the ratio scores but, rather, whether a protein is detected or not. Although consistency of individual proteins between patients is low, we found the reported proteins tend to hit clusters in a meaningful and informative manner. By extracting this information in the form of a Proteomics Signature Profile, we confirm that this information is conserved and can be used for (1) clustering of patient samples, (2) identification of significant clusters based on real biological complexes, and (3) overcoming consistency and coverage issues prevalent in proteomics data sets.

Show MeSH

Related in: MedlinePlus

Comparisonof bootstrapped HCL trees generated via pvclust. Valueson the edges of the clustering are p-values (%).Red values are AU p-values, and green values areBP values as explained early under methods. Clusters with AU largerthan 95% are highlighted by red boxes and are very strongly supportby the data. With only 73 graphlet-derived clusters, this did notprovide sufficient dimensions for clearly resolving the mod and poorpatients (left column) although Paragon fared much better becauseof better hit rates. The right column shows that with the use of amuch larger set of dimensions or clusters, in this case, derived fromCORUM, the trees are virtually identical despite that Paragon reportsa considerably larger number of proteins. It is also noteworthy inall cases; mod patient #203 is clustered with other poor patients.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3472506&req=5

fig2: Comparisonof bootstrapped HCL trees generated via pvclust. Valueson the edges of the clustering are p-values (%).Red values are AU p-values, and green values areBP values as explained early under methods. Clusters with AU largerthan 95% are highlighted by red boxes and are very strongly supportby the data. With only 73 graphlet-derived clusters, this did notprovide sufficient dimensions for clearly resolving the mod and poorpatients (left column) although Paragon fared much better becauseof better hit rates. The right column shows that with the use of amuch larger set of dimensions or clusters, in this case, derived fromCORUM, the trees are virtually identical despite that Paragon reportsa considerably larger number of proteins. It is also noteworthy inall cases; mod patient #203 is clustered with other poor patients.

Mentions: PSPwas performed using the identified proteins from Mascot and Paragonrespectively. In both cases, Paragon and Mascot generated tree structuresthat are similar. This indicates that PSP produces results that arestable. Also, Paragon consistently outperforms Mascot (Figure 2) due to its higher sensitivity.2 Since hierarchical clustering of patient PSPs is an unsupervisedmethod (i.e., no class label of patients was used), there can be nooverfitting with regards to class label of the patients.


Proteomics signature profiling (PSP): a novel contextualization approach for cancer proteomics.

Goh WW, Lee YH, Ramdzan ZM, Sergot MJ, Chung M, Wong L - J. Proteome Res. (2012)

Comparisonof bootstrapped HCL trees generated via pvclust. Valueson the edges of the clustering are p-values (%).Red values are AU p-values, and green values areBP values as explained early under methods. Clusters with AU largerthan 95% are highlighted by red boxes and are very strongly supportby the data. With only 73 graphlet-derived clusters, this did notprovide sufficient dimensions for clearly resolving the mod and poorpatients (left column) although Paragon fared much better becauseof better hit rates. The right column shows that with the use of amuch larger set of dimensions or clusters, in this case, derived fromCORUM, the trees are virtually identical despite that Paragon reportsa considerably larger number of proteins. It is also noteworthy inall cases; mod patient #203 is clustered with other poor patients.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3472506&req=5

fig2: Comparisonof bootstrapped HCL trees generated via pvclust. Valueson the edges of the clustering are p-values (%).Red values are AU p-values, and green values areBP values as explained early under methods. Clusters with AU largerthan 95% are highlighted by red boxes and are very strongly supportby the data. With only 73 graphlet-derived clusters, this did notprovide sufficient dimensions for clearly resolving the mod and poorpatients (left column) although Paragon fared much better becauseof better hit rates. The right column shows that with the use of amuch larger set of dimensions or clusters, in this case, derived fromCORUM, the trees are virtually identical despite that Paragon reportsa considerably larger number of proteins. It is also noteworthy inall cases; mod patient #203 is clustered with other poor patients.
Mentions: PSPwas performed using the identified proteins from Mascot and Paragonrespectively. In both cases, Paragon and Mascot generated tree structuresthat are similar. This indicates that PSP produces results that arestable. Also, Paragon consistently outperforms Mascot (Figure 2) due to its higher sensitivity.2 Since hierarchical clustering of patient PSPs is an unsupervisedmethod (i.e., no class label of patients was used), there can be nooverfitting with regards to class label of the patients.

Bottom Line: Comparing our results against the Proteomics Expansion Pipeline (PEP) on which the same patient data was analyzed, we found good correlation.Building on this finding, we report significantly more clusters (176 clusters here compared to 70 in PEP), demonstrating the sensitivity of this approach.Although consistency of individual proteins between patients is low, we found the reported proteins tend to hit clusters in a meaningful and informative manner.

View Article: PubMed Central - PubMed

Affiliation: Department of Computing, Imperial College London , London, United Kingdom.

ABSTRACT
Traditional proteomics analysis is plagued by the use of arbitrary thresholds resulting in large loss of information. We propose here a novel method in proteomics that utilizes all detected proteins. We demonstrate its efficacy in a proteomics screen of 5 and 7 liver cancer patients in the moderate and late stage, respectively. Utilizing biological complexes as a cluster vector, and augmenting it with submodules obtained from partitioning an integrated and cleaned protein-protein interaction network, we calculate a Proteomics Signature Profile (PSP) for each patient based on the hit rates of their reported proteins, in the absence of fold change thresholds, against the cluster vector. Using this, we demonstrated that moderate- and late-stage patients segregate with high confidence. We also discovered a moderate-stage patient who displayed a proteomics profile similar to other poor-stage patients. We identified significant clusters using a modified version of the SNet approach. Comparing our results against the Proteomics Expansion Pipeline (PEP) on which the same patient data was analyzed, we found good correlation. Building on this finding, we report significantly more clusters (176 clusters here compared to 70 in PEP), demonstrating the sensitivity of this approach. Gene Ontology (GO) terms analysis also reveals that the significant clusters are functionally congruent with the liver cancer phenotype. PSP is a powerful and sensitive method for analyzing proteomics profiles even when sample sizes are small. It does not rely on the ratio scores but, rather, whether a protein is detected or not. Although consistency of individual proteins between patients is low, we found the reported proteins tend to hit clusters in a meaningful and informative manner. By extracting this information in the form of a Proteomics Signature Profile, we confirm that this information is conserved and can be used for (1) clustering of patient samples, (2) identification of significant clusters based on real biological complexes, and (3) overcoming consistency and coverage issues prevalent in proteomics data sets.

Show MeSH
Related in: MedlinePlus