Limits...
A guide through the computational analysis of isotope-labeled mass spectrometry-based quantitative proteomics data: an application study.

Albaum SP, Hahne H, Otto A, Haußmann U, Becher D, Poetsch A, Goesmann A, Nattkemper TW - Proteome Sci (2011)

Bottom Line: This work provides guidance through the jungle of computational methods to analyze mass spectrometry-based isotope-labeled datasets and recommends an effective and easy-to-use evaluation strategy.Special focus is placed on the application and validation of cluster analysis methods.All applied methods were implemented within the rich internet application QuPE 4.

View Article: PubMed Central - HTML - PubMed

Affiliation: Computational Genomics, Center for Biotechnology (CeBiTec), Bielefeld University, Germany. alu@cebitec.uni-bielefeld.de.

ABSTRACT

Background: Mass spectrometry-based proteomics has reached a stage where it is possible to comprehensively analyze the whole proteome of a cell in one experiment. Here, the employment of stable isotopes has become a standard technique to yield relative abundance values of proteins. In recent times, more and more experiments are conducted that depict not only a static image of the up- or down-regulated proteins at a distinct time point but instead compare developmental stages of an organism or varying experimental conditions.

Results: Although the scientific questions behind these experiments are of course manifold, there are, nevertheless, two questions that commonly arise: 1) which proteins are differentially regulated regarding the selected experimental conditions, and 2) are there groups of proteins that show similar abundance ratios, indicating that they have a similar turnover? We give advice on how these two questions can be answered and comprehensively compare a variety of commonly applied computational methods and their outcomes.

Conclusions: This work provides guidance through the jungle of computational methods to analyze mass spectrometry-based isotope-labeled datasets and recommends an effective and easy-to-use evaluation strategy. We demonstrate our approach with three recently published datasets on Bacillus subtilis 12 and Corynebacterium glutamicum 3. Special focus is placed on the application and validation of cluster analysis methods. All applied methods were implemented within the rich internet application QuPE 4. Results can be found at http://qupe.cebitec.uni-bielefeld.de.

No MeSH data available.


Calinski-Harabasz. Similar to the "Index I" the cluster index ofCalinski and Harabasz tends to favor smaller cluster numbers between threeand four clusters. In the same manner, the applicability with respect to thebiological question also remains questionable.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3142201&req=5

Figure 8: Calinski-Harabasz. Similar to the "Index I" the cluster index ofCalinski and Harabasz tends to favor smaller cluster numbers between threeand four clusters. In the same manner, the applicability with respect to thebiological question also remains questionable.

Mentions: Aiming to determine an optimal clustering of each proteomics dataset regardingboth the biological as well as the computational point of view, we analyzed theresults of all applied cluster algorithms using a diversified selection of clusterindexes. Here, the index of Calinski and Harabasz [43], which sets the similarity of all proteins groupingtogether in a cluster in relation to the dissimilarities of each two clusters, andeven more the Index I [47], which followsa comparable approach, tend to favor smaller cluster numbers between two and threeclusters (see Figures 7 and Figure 8;Additional files 5, 6 and7 for further details). While from a computationalpoint of view these results seem reasonable, from a biological point of view theydo not allow any meaningful interpretation of the data. In general, these smallclusterings only characterize individual outliers, while the rest of the clustersare found with a high number of cluster members having everything clusteredtogether that reveals only a slight similarity. Experiment C is, in some respect,an exception as here the cluster index of Calinski and Harabasz gives evidence forhigher cluster numbers, e. g. 14 for Complete/Euclidean. This could result fromthe fact that the data of this experiment has a comparably low dimensionality asthere are only two different abundance ratios per protein-one for growth onbenzoate, one for glucose.


A guide through the computational analysis of isotope-labeled mass spectrometry-based quantitative proteomics data: an application study.

Albaum SP, Hahne H, Otto A, Haußmann U, Becher D, Poetsch A, Goesmann A, Nattkemper TW - Proteome Sci (2011)

Calinski-Harabasz. Similar to the "Index I" the cluster index ofCalinski and Harabasz tends to favor smaller cluster numbers between threeand four clusters. In the same manner, the applicability with respect to thebiological question also remains questionable.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3142201&req=5

Figure 8: Calinski-Harabasz. Similar to the "Index I" the cluster index ofCalinski and Harabasz tends to favor smaller cluster numbers between threeand four clusters. In the same manner, the applicability with respect to thebiological question also remains questionable.
Mentions: Aiming to determine an optimal clustering of each proteomics dataset regardingboth the biological as well as the computational point of view, we analyzed theresults of all applied cluster algorithms using a diversified selection of clusterindexes. Here, the index of Calinski and Harabasz [43], which sets the similarity of all proteins groupingtogether in a cluster in relation to the dissimilarities of each two clusters, andeven more the Index I [47], which followsa comparable approach, tend to favor smaller cluster numbers between two and threeclusters (see Figures 7 and Figure 8;Additional files 5, 6 and7 for further details). While from a computationalpoint of view these results seem reasonable, from a biological point of view theydo not allow any meaningful interpretation of the data. In general, these smallclusterings only characterize individual outliers, while the rest of the clustersare found with a high number of cluster members having everything clusteredtogether that reveals only a slight similarity. Experiment C is, in some respect,an exception as here the cluster index of Calinski and Harabasz gives evidence forhigher cluster numbers, e. g. 14 for Complete/Euclidean. This could result fromthe fact that the data of this experiment has a comparably low dimensionality asthere are only two different abundance ratios per protein-one for growth onbenzoate, one for glucose.

Bottom Line: This work provides guidance through the jungle of computational methods to analyze mass spectrometry-based isotope-labeled datasets and recommends an effective and easy-to-use evaluation strategy.Special focus is placed on the application and validation of cluster analysis methods.All applied methods were implemented within the rich internet application QuPE 4.

View Article: PubMed Central - HTML - PubMed

Affiliation: Computational Genomics, Center for Biotechnology (CeBiTec), Bielefeld University, Germany. alu@cebitec.uni-bielefeld.de.

ABSTRACT

Background: Mass spectrometry-based proteomics has reached a stage where it is possible to comprehensively analyze the whole proteome of a cell in one experiment. Here, the employment of stable isotopes has become a standard technique to yield relative abundance values of proteins. In recent times, more and more experiments are conducted that depict not only a static image of the up- or down-regulated proteins at a distinct time point but instead compare developmental stages of an organism or varying experimental conditions.

Results: Although the scientific questions behind these experiments are of course manifold, there are, nevertheless, two questions that commonly arise: 1) which proteins are differentially regulated regarding the selected experimental conditions, and 2) are there groups of proteins that show similar abundance ratios, indicating that they have a similar turnover? We give advice on how these two questions can be answered and comprehensively compare a variety of commonly applied computational methods and their outcomes.

Conclusions: This work provides guidance through the jungle of computational methods to analyze mass spectrometry-based isotope-labeled datasets and recommends an effective and easy-to-use evaluation strategy. We demonstrate our approach with three recently published datasets on Bacillus subtilis 12 and Corynebacterium glutamicum 3. Special focus is placed on the application and validation of cluster analysis methods. All applied methods were implemented within the rich internet application QuPE 4. Results can be found at http://qupe.cebitec.uni-bielefeld.de.

No MeSH data available.