Limits...
Analysis of high accuracy, quantitative proteomics data in the MaxQB database.

Schaab C, Geiger T, Stoehr G, Cox J, Mann M - Mol. Cell Proteomics (2012)

Bottom Line: We used MaxQB to calculate the signal reproducibility of the detected peptides for the same proteins across different proteomes.Spearman rank correlation between peptide intensity and detection probability of identified proteins was greater than 0.8 for 64% of the proteome, whereas a minority of proteins have negative correlation.This information can be used to pinpoint false protein identifications, independently of peptide database scores.

View Article: PubMed Central - PubMed

Affiliation: Department of Proteomics and Signal Transduction, Max-Planck Institute of Biochemistry, D-82152 Martinsried, Germany.

ABSTRACT
MS-based proteomics generates rapidly increasing amounts of precise and quantitative information. Analysis of individual proteomic experiments has made great strides, but the crucial ability to compare and store information across different proteome measurements still presents many challenges. For example, it has been difficult to avoid contamination of databases with low quality peptide identifications, to control for the inflation in false positive identifications when combining data sets, and to integrate quantitative data. Although, for example, the contamination with low quality identifications has been addressed by joint analysis of deposited raw data in some public repositories, we reasoned that there should be a role for a database specifically designed for high resolution and quantitative data. Here we describe a novel database termed MaxQB that stores and displays collections of large proteomics projects and allows joint analysis and comparison. We demonstrate the analysis tools of MaxQB using proteome data of 11 different human cell lines and 28 mouse tissues. The database-wide false discovery rate is controlled by adjusting the project specific cutoff scores for the combined data sets. The 11 cell line proteomes together identify proteins expressed from more than half of all human genes. For each protein of interest, expression levels estimated by label-free quantification can be visualized across the cell lines. Similarly, the expression rank order and estimated amount of each protein within each proteome are plotted. We used MaxQB to calculate the signal reproducibility of the detected peptides for the same proteins across different proteomes. Spearman rank correlation between peptide intensity and detection probability of identified proteins was greater than 0.8 for 64% of the proteome, whereas a minority of proteins have negative correlation. This information can be used to pinpoint false protein identifications, independently of peptide database scores. The information contained in MaxQB, including high resolution fragment spectra, is accessible to the community via a user-friendly web interface at http://www.biochem.mpg.de/maxqb.

Show MeSH
A, query proteins for human DNA polymerase epsilon subunits. B, select POLE and show details on this protein. C, histogram of protein expression across 11 cell lines. D, expression of POLE compared with expression of all other detected proteins in HEK293 cells. E, expression of the mouse ortholog across 28 mouse tissues.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3316731&req=5

Figure 3: A, query proteins for human DNA polymerase epsilon subunits. B, select POLE and show details on this protein. C, histogram of protein expression across 11 cell lines. D, expression of POLE compared with expression of all other detected proteins in HEK293 cells. E, expression of the mouse ortholog across 28 mouse tissues.

Mentions: To illustrate practical use of MaxQB, we next describe three “use cases” dealing with diverse types of questions that can be addressed by this novel database. As a first use case, we assume that the user is interested in members of a specific protein family—here DNA polymerase epsilon subunits (POLE)—and wants to investigate their expression across the different cell lines and additionally across mouse tissues. The user can query the database by various fields, specifically by gene name, organism, and source database. The query terms can be combined by Boolean logic and grouped using parentheses. Alternatively, the query builder can be used if one is not familiar with the query syntax. In this example, the user searches for all human Uniprot entries that have a gene name beginning with “POLE” (Fig. 3A). The query returns four subunits. By clicking on one of the hits (POLE), the user obtains additional details (Fig. 3B). In particular, this resulting page specifies the entries in the databases IPI and Ensembl with identical sequence. On the protein expression tab, a bar chart visualizes the protein expression across the 11 human cell lines. Expression of POLE varies by more than 2 orders of magnitude between LnCap (lowest expression) and HEK293 (highest expression) calculated by label-free quantification in MaxQuant (26) (Fig. 3C). In addition to estimating expression of the same protein between proteomes, MaxQB can also display expression within any of the proteomes, compared with all other quantified proteins in that proteome. Here, the expression of the protein is estimated by the sum of its peptide signals, after normalization of the total proteome signals to each other in MaxQuant. The iBAQ algorithm (27) is now implemented into MaxQuant and can also be used to estimate protein amounts. In Fig. 3D, selection of the HEK293 proteome brings up a distribution plot comparing the expression of the protein of interest with all other proteins in this cell line. This reveals that POLE is among the highly expressed proteins in these cells (within the top 15-percentile). The sequence coverage tab for the corresponding protein group shows the distribution of identified peptides along the sequence of POLE and across the 11 cell lines and their biological replicates (Fig. 4). Additionally, the in silico digested peptides with masses between 0.6 and 4 kDa and the known domains as retrieved from Uniprot (28) are displayed.


Analysis of high accuracy, quantitative proteomics data in the MaxQB database.

Schaab C, Geiger T, Stoehr G, Cox J, Mann M - Mol. Cell Proteomics (2012)

A, query proteins for human DNA polymerase epsilon subunits. B, select POLE and show details on this protein. C, histogram of protein expression across 11 cell lines. D, expression of POLE compared with expression of all other detected proteins in HEK293 cells. E, expression of the mouse ortholog across 28 mouse tissues.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3316731&req=5

Figure 3: A, query proteins for human DNA polymerase epsilon subunits. B, select POLE and show details on this protein. C, histogram of protein expression across 11 cell lines. D, expression of POLE compared with expression of all other detected proteins in HEK293 cells. E, expression of the mouse ortholog across 28 mouse tissues.
Mentions: To illustrate practical use of MaxQB, we next describe three “use cases” dealing with diverse types of questions that can be addressed by this novel database. As a first use case, we assume that the user is interested in members of a specific protein family—here DNA polymerase epsilon subunits (POLE)—and wants to investigate their expression across the different cell lines and additionally across mouse tissues. The user can query the database by various fields, specifically by gene name, organism, and source database. The query terms can be combined by Boolean logic and grouped using parentheses. Alternatively, the query builder can be used if one is not familiar with the query syntax. In this example, the user searches for all human Uniprot entries that have a gene name beginning with “POLE” (Fig. 3A). The query returns four subunits. By clicking on one of the hits (POLE), the user obtains additional details (Fig. 3B). In particular, this resulting page specifies the entries in the databases IPI and Ensembl with identical sequence. On the protein expression tab, a bar chart visualizes the protein expression across the 11 human cell lines. Expression of POLE varies by more than 2 orders of magnitude between LnCap (lowest expression) and HEK293 (highest expression) calculated by label-free quantification in MaxQuant (26) (Fig. 3C). In addition to estimating expression of the same protein between proteomes, MaxQB can also display expression within any of the proteomes, compared with all other quantified proteins in that proteome. Here, the expression of the protein is estimated by the sum of its peptide signals, after normalization of the total proteome signals to each other in MaxQuant. The iBAQ algorithm (27) is now implemented into MaxQuant and can also be used to estimate protein amounts. In Fig. 3D, selection of the HEK293 proteome brings up a distribution plot comparing the expression of the protein of interest with all other proteins in this cell line. This reveals that POLE is among the highly expressed proteins in these cells (within the top 15-percentile). The sequence coverage tab for the corresponding protein group shows the distribution of identified peptides along the sequence of POLE and across the 11 cell lines and their biological replicates (Fig. 4). Additionally, the in silico digested peptides with masses between 0.6 and 4 kDa and the known domains as retrieved from Uniprot (28) are displayed.

Bottom Line: We used MaxQB to calculate the signal reproducibility of the detected peptides for the same proteins across different proteomes.Spearman rank correlation between peptide intensity and detection probability of identified proteins was greater than 0.8 for 64% of the proteome, whereas a minority of proteins have negative correlation.This information can be used to pinpoint false protein identifications, independently of peptide database scores.

View Article: PubMed Central - PubMed

Affiliation: Department of Proteomics and Signal Transduction, Max-Planck Institute of Biochemistry, D-82152 Martinsried, Germany.

ABSTRACT
MS-based proteomics generates rapidly increasing amounts of precise and quantitative information. Analysis of individual proteomic experiments has made great strides, but the crucial ability to compare and store information across different proteome measurements still presents many challenges. For example, it has been difficult to avoid contamination of databases with low quality peptide identifications, to control for the inflation in false positive identifications when combining data sets, and to integrate quantitative data. Although, for example, the contamination with low quality identifications has been addressed by joint analysis of deposited raw data in some public repositories, we reasoned that there should be a role for a database specifically designed for high resolution and quantitative data. Here we describe a novel database termed MaxQB that stores and displays collections of large proteomics projects and allows joint analysis and comparison. We demonstrate the analysis tools of MaxQB using proteome data of 11 different human cell lines and 28 mouse tissues. The database-wide false discovery rate is controlled by adjusting the project specific cutoff scores for the combined data sets. The 11 cell line proteomes together identify proteins expressed from more than half of all human genes. For each protein of interest, expression levels estimated by label-free quantification can be visualized across the cell lines. Similarly, the expression rank order and estimated amount of each protein within each proteome are plotted. We used MaxQB to calculate the signal reproducibility of the detected peptides for the same proteins across different proteomes. Spearman rank correlation between peptide intensity and detection probability of identified proteins was greater than 0.8 for 64% of the proteome, whereas a minority of proteins have negative correlation. This information can be used to pinpoint false protein identifications, independently of peptide database scores. The information contained in MaxQB, including high resolution fragment spectra, is accessible to the community via a user-friendly web interface at http://www.biochem.mpg.de/maxqb.

Show MeSH