Limits...
PaxDb, a database of protein abundance averages across all three domains of life.

Wang M, Weiss M, Simonovic M, Haertinger G, Schrimpf SP, Hengartner MO, von Mering C - Mol. Cell Proteomics (2012)

Bottom Line: Although protein expression is regulated both temporally and spatially, most proteins have an intrinsic, "typical" range of functionally effective abundance levels.Publicly available experimental data are mapped onto a common namespace and, in the case of tandem mass spectrometry data, re-processed using a standardized spectral counting pipeline.We score and rank each contributing, individual data set by assessing its consistency against externally provided protein-network information, and demonstrate that our weighted integration exhibits more consistency than the data sets individually.

View Article: PubMed Central - PubMed

Affiliation: Institute of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland.

ABSTRACT
Although protein expression is regulated both temporally and spatially, most proteins have an intrinsic, "typical" range of functionally effective abundance levels. These extend from a few molecules per cell for signaling proteins, to millions of molecules for structural proteins. When addressing fundamental questions related to protein evolution, translation and folding, but also in routine laboratory work, a simple rough estimate of the average wild type abundance of each detectable protein in an organism is often desirable. Here, we introduce a meta-resource dedicated to integrating information on absolute protein abundance levels; we place particular emphasis on deep coverage, consistent post-processing and comparability across different organisms. Publicly available experimental data are mapped onto a common namespace and, in the case of tandem mass spectrometry data, re-processed using a standardized spectral counting pipeline. By aggregating and averaging over the various samples, conditions and cell-types, the resulting integrated data set achieves increased coverage and a high dynamic range. We score and rank each contributing, individual data set by assessing its consistency against externally provided protein-network information, and demonstrate that our weighted integration exhibits more consistency than the data sets individually. The current PaxDb-release 2.1 (at http://pax-db.org/) presents whole-organism data as well as tissue-resolved data, and covers 85,000 proteins in 12 model organisms. All values can be seamlessly compared across organisms via pre-computed orthology relationships.

Show MeSH

Related in: MedlinePlus

PaxDb overview. For each release of PaxDb, protein abundance information is imported from a number of sources, including proteomics repositories and published studies. All data is preprocessed and, in the case of raw MS/MS data, protein abundances are recalculated by spectral counting. Additional information is imported from the STRING database. The representation of data is structured in three different views: 1) information about a single protein, 2) abundance tables for all detectable proteins in an organism, and 3) a summary page for every organism, listing available data sets. Where several data sets exist for one organism, PaxDb also provides a weighted-average integrated data set that is more comprehensive and has less noise than the single data sets.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3412977&req=5

Figure 1: PaxDb overview. For each release of PaxDb, protein abundance information is imported from a number of sources, including proteomics repositories and published studies. All data is preprocessed and, in the case of raw MS/MS data, protein abundances are recalculated by spectral counting. Additional information is imported from the STRING database. The representation of data is structured in three different views: 1) information about a single protein, 2) abundance tables for all detectable proteins in an organism, and 3) a summary page for every organism, listing available data sets. Where several data sets exist for one organism, PaxDb also provides a weighted-average integrated data set that is more comprehensive and has less noise than the single data sets.

Mentions: The PaxDb database (“Protein Abundances Across Organisms”) currently covers 12 model species from all three domains of life, ranging from a single-celled archaeon to complex eukaryotes. For each of these organisms, PaxDb aims to provide individual (tissue-resolved) data sets, but in addition also a single, consolidated abundance estimate of all detectable proteins. This latter estimate is meant to be an organism-wide average of protein expression, aggregating over all available data sets (from various environmental conditions and developmental stages). Where applicable, consolidated averages are provided for specific tissues as well, i.e. wherever several independent data sets are available for a given tissue. Fig. 1 outlines the basic flowchart for each new release of PaxDb (the current release version is 2.1). Apart from the final, aggregated averages, each imported data set is also made available, as is—after re-mapping onto a common, up-to-date version of the respective model organism genome. All abundance data is presented in the same numerical framework, i.e. expressing average steady-state protein abundances in molecular counts, normalized to “parts per million” (ppm; see section “Experimental Procedures” above). Apart from these abundance estimates, each protein is presented together with accessory information regarding the annotated function, sequence and structural information, and within a network context of known or predicted functional interaction partners. All of this latter, additional information is imported from the STRING database (54), with which PaxDb shares the protein name-space and all functional annotation information. Apart from the protein-centered information, PaxDb also contains summary metrics describing each data set, such as its abundance distribution over the entire proteome. Furthermore, all proteins are grouped into families of orthologs (“orthologous groups”), which enables a direct comparison of abundance estimates across organisms.


PaxDb, a database of protein abundance averages across all three domains of life.

Wang M, Weiss M, Simonovic M, Haertinger G, Schrimpf SP, Hengartner MO, von Mering C - Mol. Cell Proteomics (2012)

PaxDb overview. For each release of PaxDb, protein abundance information is imported from a number of sources, including proteomics repositories and published studies. All data is preprocessed and, in the case of raw MS/MS data, protein abundances are recalculated by spectral counting. Additional information is imported from the STRING database. The representation of data is structured in three different views: 1) information about a single protein, 2) abundance tables for all detectable proteins in an organism, and 3) a summary page for every organism, listing available data sets. Where several data sets exist for one organism, PaxDb also provides a weighted-average integrated data set that is more comprehensive and has less noise than the single data sets.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3412977&req=5

Figure 1: PaxDb overview. For each release of PaxDb, protein abundance information is imported from a number of sources, including proteomics repositories and published studies. All data is preprocessed and, in the case of raw MS/MS data, protein abundances are recalculated by spectral counting. Additional information is imported from the STRING database. The representation of data is structured in three different views: 1) information about a single protein, 2) abundance tables for all detectable proteins in an organism, and 3) a summary page for every organism, listing available data sets. Where several data sets exist for one organism, PaxDb also provides a weighted-average integrated data set that is more comprehensive and has less noise than the single data sets.
Mentions: The PaxDb database (“Protein Abundances Across Organisms”) currently covers 12 model species from all three domains of life, ranging from a single-celled archaeon to complex eukaryotes. For each of these organisms, PaxDb aims to provide individual (tissue-resolved) data sets, but in addition also a single, consolidated abundance estimate of all detectable proteins. This latter estimate is meant to be an organism-wide average of protein expression, aggregating over all available data sets (from various environmental conditions and developmental stages). Where applicable, consolidated averages are provided for specific tissues as well, i.e. wherever several independent data sets are available for a given tissue. Fig. 1 outlines the basic flowchart for each new release of PaxDb (the current release version is 2.1). Apart from the final, aggregated averages, each imported data set is also made available, as is—after re-mapping onto a common, up-to-date version of the respective model organism genome. All abundance data is presented in the same numerical framework, i.e. expressing average steady-state protein abundances in molecular counts, normalized to “parts per million” (ppm; see section “Experimental Procedures” above). Apart from these abundance estimates, each protein is presented together with accessory information regarding the annotated function, sequence and structural information, and within a network context of known or predicted functional interaction partners. All of this latter, additional information is imported from the STRING database (54), with which PaxDb shares the protein name-space and all functional annotation information. Apart from the protein-centered information, PaxDb also contains summary metrics describing each data set, such as its abundance distribution over the entire proteome. Furthermore, all proteins are grouped into families of orthologs (“orthologous groups”), which enables a direct comparison of abundance estimates across organisms.

Bottom Line: Although protein expression is regulated both temporally and spatially, most proteins have an intrinsic, "typical" range of functionally effective abundance levels.Publicly available experimental data are mapped onto a common namespace and, in the case of tandem mass spectrometry data, re-processed using a standardized spectral counting pipeline.We score and rank each contributing, individual data set by assessing its consistency against externally provided protein-network information, and demonstrate that our weighted integration exhibits more consistency than the data sets individually.

View Article: PubMed Central - PubMed

Affiliation: Institute of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland.

ABSTRACT
Although protein expression is regulated both temporally and spatially, most proteins have an intrinsic, "typical" range of functionally effective abundance levels. These extend from a few molecules per cell for signaling proteins, to millions of molecules for structural proteins. When addressing fundamental questions related to protein evolution, translation and folding, but also in routine laboratory work, a simple rough estimate of the average wild type abundance of each detectable protein in an organism is often desirable. Here, we introduce a meta-resource dedicated to integrating information on absolute protein abundance levels; we place particular emphasis on deep coverage, consistent post-processing and comparability across different organisms. Publicly available experimental data are mapped onto a common namespace and, in the case of tandem mass spectrometry data, re-processed using a standardized spectral counting pipeline. By aggregating and averaging over the various samples, conditions and cell-types, the resulting integrated data set achieves increased coverage and a high dynamic range. We score and rank each contributing, individual data set by assessing its consistency against externally provided protein-network information, and demonstrate that our weighted integration exhibits more consistency than the data sets individually. The current PaxDb-release 2.1 (at http://pax-db.org/) presents whole-organism data as well as tissue-resolved data, and covers 85,000 proteins in 12 model organisms. All values can be seamlessly compared across organisms via pre-computed orthology relationships.

Show MeSH
Related in: MedlinePlus