Limits...
Analysis of high accuracy, quantitative proteomics data in the MaxQB database.

Schaab C, Geiger T, Stoehr G, Cox J, Mann M - Mol. Cell Proteomics (2012)

Bottom Line: We used MaxQB to calculate the signal reproducibility of the detected peptides for the same proteins across different proteomes.Spearman rank correlation between peptide intensity and detection probability of identified proteins was greater than 0.8 for 64% of the proteome, whereas a minority of proteins have negative correlation.This information can be used to pinpoint false protein identifications, independently of peptide database scores.

View Article: PubMed Central - PubMed

Affiliation: Department of Proteomics and Signal Transduction, Max-Planck Institute of Biochemistry, D-82152 Martinsried, Germany.

ABSTRACT
MS-based proteomics generates rapidly increasing amounts of precise and quantitative information. Analysis of individual proteomic experiments has made great strides, but the crucial ability to compare and store information across different proteome measurements still presents many challenges. For example, it has been difficult to avoid contamination of databases with low quality peptide identifications, to control for the inflation in false positive identifications when combining data sets, and to integrate quantitative data. Although, for example, the contamination with low quality identifications has been addressed by joint analysis of deposited raw data in some public repositories, we reasoned that there should be a role for a database specifically designed for high resolution and quantitative data. Here we describe a novel database termed MaxQB that stores and displays collections of large proteomics projects and allows joint analysis and comparison. We demonstrate the analysis tools of MaxQB using proteome data of 11 different human cell lines and 28 mouse tissues. The database-wide false discovery rate is controlled by adjusting the project specific cutoff scores for the combined data sets. The 11 cell line proteomes together identify proteins expressed from more than half of all human genes. For each protein of interest, expression levels estimated by label-free quantification can be visualized across the cell lines. Similarly, the expression rank order and estimated amount of each protein within each proteome are plotted. We used MaxQB to calculate the signal reproducibility of the detected peptides for the same proteins across different proteomes. Spearman rank correlation between peptide intensity and detection probability of identified proteins was greater than 0.8 for 64% of the proteome, whereas a minority of proteins have negative correlation. This information can be used to pinpoint false protein identifications, independently of peptide database scores. The information contained in MaxQB, including high resolution fragment spectra, is accessible to the community via a user-friendly web interface at http://www.biochem.mpg.de/maxqb.

Show MeSH
A, query for unique peptides for CDK2 with a score greater 80 and no missed cleavages. B, the fragment spectrum with the best evidence for peptide AFGVPVR.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3316731&req=5

Figure 6: A, query for unique peptides for CDK2 with a score greater 80 and no missed cleavages. B, the fragment spectrum with the best evidence for peptide AFGVPVR.

Mentions: A popular use of proteomics repositories is the selection of peptides suitable for targeted methods such as multiple reaction monitoring. In the third use case, a user is interested in establishing an multiple reaction monitoring assay for the cell cycle protein CDK2 and starts by searching for all peptides that are unique for CDK2, have an Andromeda identification score larger than 80, and have no missed cleavages. As in the search for proteins described above, query terms can be combined by Boolean logic (Fig. 6A). The query returns seven peptides fulfilling these criteria. The user selects the peptide AFGVPVR and displays the fragment spectrum for the best identification evidence for this peptide (Fig. 6B). The user can now export the list of peaks together with the masses and annotations and use this as a basis for creating multiple reaction monitoring transitions. A particular advantage of using MaxQB for this use case is the fact that this database contains high resolution fragmentation spectra that are obtained by the higher energy collisional dissociation method, which produces very similar transitions to those that would be observed in triple quadrupole methods (30).


Analysis of high accuracy, quantitative proteomics data in the MaxQB database.

Schaab C, Geiger T, Stoehr G, Cox J, Mann M - Mol. Cell Proteomics (2012)

A, query for unique peptides for CDK2 with a score greater 80 and no missed cleavages. B, the fragment spectrum with the best evidence for peptide AFGVPVR.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3316731&req=5

Figure 6: A, query for unique peptides for CDK2 with a score greater 80 and no missed cleavages. B, the fragment spectrum with the best evidence for peptide AFGVPVR.
Mentions: A popular use of proteomics repositories is the selection of peptides suitable for targeted methods such as multiple reaction monitoring. In the third use case, a user is interested in establishing an multiple reaction monitoring assay for the cell cycle protein CDK2 and starts by searching for all peptides that are unique for CDK2, have an Andromeda identification score larger than 80, and have no missed cleavages. As in the search for proteins described above, query terms can be combined by Boolean logic (Fig. 6A). The query returns seven peptides fulfilling these criteria. The user selects the peptide AFGVPVR and displays the fragment spectrum for the best identification evidence for this peptide (Fig. 6B). The user can now export the list of peaks together with the masses and annotations and use this as a basis for creating multiple reaction monitoring transitions. A particular advantage of using MaxQB for this use case is the fact that this database contains high resolution fragmentation spectra that are obtained by the higher energy collisional dissociation method, which produces very similar transitions to those that would be observed in triple quadrupole methods (30).

Bottom Line: We used MaxQB to calculate the signal reproducibility of the detected peptides for the same proteins across different proteomes.Spearman rank correlation between peptide intensity and detection probability of identified proteins was greater than 0.8 for 64% of the proteome, whereas a minority of proteins have negative correlation.This information can be used to pinpoint false protein identifications, independently of peptide database scores.

View Article: PubMed Central - PubMed

Affiliation: Department of Proteomics and Signal Transduction, Max-Planck Institute of Biochemistry, D-82152 Martinsried, Germany.

ABSTRACT
MS-based proteomics generates rapidly increasing amounts of precise and quantitative information. Analysis of individual proteomic experiments has made great strides, but the crucial ability to compare and store information across different proteome measurements still presents many challenges. For example, it has been difficult to avoid contamination of databases with low quality peptide identifications, to control for the inflation in false positive identifications when combining data sets, and to integrate quantitative data. Although, for example, the contamination with low quality identifications has been addressed by joint analysis of deposited raw data in some public repositories, we reasoned that there should be a role for a database specifically designed for high resolution and quantitative data. Here we describe a novel database termed MaxQB that stores and displays collections of large proteomics projects and allows joint analysis and comparison. We demonstrate the analysis tools of MaxQB using proteome data of 11 different human cell lines and 28 mouse tissues. The database-wide false discovery rate is controlled by adjusting the project specific cutoff scores for the combined data sets. The 11 cell line proteomes together identify proteins expressed from more than half of all human genes. For each protein of interest, expression levels estimated by label-free quantification can be visualized across the cell lines. Similarly, the expression rank order and estimated amount of each protein within each proteome are plotted. We used MaxQB to calculate the signal reproducibility of the detected peptides for the same proteins across different proteomes. Spearman rank correlation between peptide intensity and detection probability of identified proteins was greater than 0.8 for 64% of the proteome, whereas a minority of proteins have negative correlation. This information can be used to pinpoint false protein identifications, independently of peptide database scores. The information contained in MaxQB, including high resolution fragment spectra, is accessible to the community via a user-friendly web interface at http://www.biochem.mpg.de/maxqb.

Show MeSH