Limits...
A high confidence, manually validated human blood plasma protein reference set.

Schenk S, Schoenhals GJ, de Souza G, Mann M - BMC Med Genomics (2008)

Bottom Line: Both instruments allow the measurement of peptide masses in the low ppm range.Furthermore, we employed a statistical score that allows database peptide identification searching using the products of two consecutive stages of tandem mass spectrometry (MS3).The combination of MS3 with very high mass accuracy in the parent peptide allows peptide identification with orders of magnitude more confidence than that typically achieved.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biochemistry and Molecular Biology, Bioinformatics, University of Southern Denmark, Campusvej 55, 5230 Odense M,Denmark. sschenk@bmb.sdu.dk

ABSTRACT

Background: The immense diagnostic potential of human plasma has prompted great interest and effort in cataloging its contents, exemplified by the Human Proteome Organization (HUPO) Plasma Proteome Project (PPP) pilot project. Due to challenges in obtaining a reliable blood plasma protein list, HUPO later re-analysed their own original dataset with a more stringent statistical treatment that resulted in a much reduced list of high confidence (at least 95%) proteins compared with their original findings. In order to facilitate the discovery of novel biomarkers in the future and to realize the full diagnostic potential of blood plasma, we feel that there is still a need for an ultra-high confidence reference list (at least 99% confidence) of blood plasma proteins.

Methods: To address the complexity and dynamic protein concentration range of the plasma proteome, we employed a linear ion-trap-Fourier transform (LTQ-FT) and a linear ion trap-Orbitrap (LTQ-Orbitrap) for mass spectrometry (MS) analysis. Both instruments allow the measurement of peptide masses in the low ppm range. Furthermore, we employed a statistical score that allows database peptide identification searching using the products of two consecutive stages of tandem mass spectrometry (MS3). The combination of MS3 with very high mass accuracy in the parent peptide allows peptide identification with orders of magnitude more confidence than that typically achieved.

Results: Herein we established a high confidence set of 697 blood plasma proteins and achieved a high 'average sequence coverage' of more than 14 peptides per protein and a median of 6 peptides per protein. All proteins annotated as belonging to the immunoglobulin family as well as all hypothetical proteins whose peptides completely matched immunoglobulin sequences were excluded from this protein list. We also compared the results of using two high-end MS instruments as well as the use of various peptide and protein separation approaches. Furthermore, we characterized the plasma proteins using cellular localization information, as well as comparing our list of proteins to data from other sources, including the HUPO PPP dataset.

Conclusion: Superior instrumentation combined with rigorous validation criteria gave rise to a set of 697 plasma proteins in which we have very high confidence, demonstrated by an exceptionally low false peptide identification rate of 0.29%.

No MeSH data available.


Related in: MedlinePlus

GoMiner analysis of proteins found to be in common with this study and the HUPO study. Panel A depicts a pie chart representation of a GoMiner analysis of the 242 proteins found to be in common between ours and the HUPO data sets. 208 of these were categorized by GoMiner to be "GO cellular component" as indicated. 98 of these proteins were categorized as cellular and 146 were categorized as extracellular, with 18 of the extracellular category being further classified as extracellular matrix proteins. Due to redundancy within the cellular and extracellular categories, the sum of the two categories was normalized to 100% for the purpose of calculating percentages. Panel B shows a histogram representing the molecular weight distribution of the 242 proteins found to be in common between ours and the HUPO data sets. The protein molecular weights were categorized as indicated in the panel before plotting.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2563020&req=5

Figure 10: GoMiner analysis of proteins found to be in common with this study and the HUPO study. Panel A depicts a pie chart representation of a GoMiner analysis of the 242 proteins found to be in common between ours and the HUPO data sets. 208 of these were categorized by GoMiner to be "GO cellular component" as indicated. 98 of these proteins were categorized as cellular and 146 were categorized as extracellular, with 18 of the extracellular category being further classified as extracellular matrix proteins. Due to redundancy within the cellular and extracellular categories, the sum of the two categories was normalized to 100% for the purpose of calculating percentages. Panel B shows a histogram representing the molecular weight distribution of the 242 proteins found to be in common between ours and the HUPO data sets. The protein molecular weights were categorized as indicated in the panel before plotting.

Mentions: Interestingly, if we perform a GoMiner analysis on the 242 proteins common to both our and the HUPO datasets, 227/242 (86%) of the protein IPI numbers are recognized with 208 being classified as "cellular component" by the program. As with the analysis of the BPPD dataset, some proteins appear in more than one category. GoMiner classified 98 (40%; normalized as in Results) proteins as cellular and as expected, a high percentage of proteins were classified as extracellular (146 proteins or 60%; normalized as in Results) (Figure 10, panel A). 18 of the extracellular proteins were classified as extracellular matrix proteins, leaving 128 "true" plasma proteins. It is more probable that independent investigators will co-identify classical plasma proteins rather than cellular proteins that might be found in plasma only as a consequence of tissue remodeling or cell death. These cellular proteins are likely to be found in the plasma only at certain points in time, so it is less plausible that independent groups will co-identify the same cellular proteins, given the disparate nature of their samples.


A high confidence, manually validated human blood plasma protein reference set.

Schenk S, Schoenhals GJ, de Souza G, Mann M - BMC Med Genomics (2008)

GoMiner analysis of proteins found to be in common with this study and the HUPO study. Panel A depicts a pie chart representation of a GoMiner analysis of the 242 proteins found to be in common between ours and the HUPO data sets. 208 of these were categorized by GoMiner to be "GO cellular component" as indicated. 98 of these proteins were categorized as cellular and 146 were categorized as extracellular, with 18 of the extracellular category being further classified as extracellular matrix proteins. Due to redundancy within the cellular and extracellular categories, the sum of the two categories was normalized to 100% for the purpose of calculating percentages. Panel B shows a histogram representing the molecular weight distribution of the 242 proteins found to be in common between ours and the HUPO data sets. The protein molecular weights were categorized as indicated in the panel before plotting.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2563020&req=5

Figure 10: GoMiner analysis of proteins found to be in common with this study and the HUPO study. Panel A depicts a pie chart representation of a GoMiner analysis of the 242 proteins found to be in common between ours and the HUPO data sets. 208 of these were categorized by GoMiner to be "GO cellular component" as indicated. 98 of these proteins were categorized as cellular and 146 were categorized as extracellular, with 18 of the extracellular category being further classified as extracellular matrix proteins. Due to redundancy within the cellular and extracellular categories, the sum of the two categories was normalized to 100% for the purpose of calculating percentages. Panel B shows a histogram representing the molecular weight distribution of the 242 proteins found to be in common between ours and the HUPO data sets. The protein molecular weights were categorized as indicated in the panel before plotting.
Mentions: Interestingly, if we perform a GoMiner analysis on the 242 proteins common to both our and the HUPO datasets, 227/242 (86%) of the protein IPI numbers are recognized with 208 being classified as "cellular component" by the program. As with the analysis of the BPPD dataset, some proteins appear in more than one category. GoMiner classified 98 (40%; normalized as in Results) proteins as cellular and as expected, a high percentage of proteins were classified as extracellular (146 proteins or 60%; normalized as in Results) (Figure 10, panel A). 18 of the extracellular proteins were classified as extracellular matrix proteins, leaving 128 "true" plasma proteins. It is more probable that independent investigators will co-identify classical plasma proteins rather than cellular proteins that might be found in plasma only as a consequence of tissue remodeling or cell death. These cellular proteins are likely to be found in the plasma only at certain points in time, so it is less plausible that independent groups will co-identify the same cellular proteins, given the disparate nature of their samples.

Bottom Line: Both instruments allow the measurement of peptide masses in the low ppm range.Furthermore, we employed a statistical score that allows database peptide identification searching using the products of two consecutive stages of tandem mass spectrometry (MS3).The combination of MS3 with very high mass accuracy in the parent peptide allows peptide identification with orders of magnitude more confidence than that typically achieved.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biochemistry and Molecular Biology, Bioinformatics, University of Southern Denmark, Campusvej 55, 5230 Odense M,Denmark. sschenk@bmb.sdu.dk

ABSTRACT

Background: The immense diagnostic potential of human plasma has prompted great interest and effort in cataloging its contents, exemplified by the Human Proteome Organization (HUPO) Plasma Proteome Project (PPP) pilot project. Due to challenges in obtaining a reliable blood plasma protein list, HUPO later re-analysed their own original dataset with a more stringent statistical treatment that resulted in a much reduced list of high confidence (at least 95%) proteins compared with their original findings. In order to facilitate the discovery of novel biomarkers in the future and to realize the full diagnostic potential of blood plasma, we feel that there is still a need for an ultra-high confidence reference list (at least 99% confidence) of blood plasma proteins.

Methods: To address the complexity and dynamic protein concentration range of the plasma proteome, we employed a linear ion-trap-Fourier transform (LTQ-FT) and a linear ion trap-Orbitrap (LTQ-Orbitrap) for mass spectrometry (MS) analysis. Both instruments allow the measurement of peptide masses in the low ppm range. Furthermore, we employed a statistical score that allows database peptide identification searching using the products of two consecutive stages of tandem mass spectrometry (MS3). The combination of MS3 with very high mass accuracy in the parent peptide allows peptide identification with orders of magnitude more confidence than that typically achieved.

Results: Herein we established a high confidence set of 697 blood plasma proteins and achieved a high 'average sequence coverage' of more than 14 peptides per protein and a median of 6 peptides per protein. All proteins annotated as belonging to the immunoglobulin family as well as all hypothetical proteins whose peptides completely matched immunoglobulin sequences were excluded from this protein list. We also compared the results of using two high-end MS instruments as well as the use of various peptide and protein separation approaches. Furthermore, we characterized the plasma proteins using cellular localization information, as well as comparing our list of proteins to data from other sources, including the HUPO PPP dataset.

Conclusion: Superior instrumentation combined with rigorous validation criteria gave rise to a set of 697 plasma proteins in which we have very high confidence, demonstrated by an exceptionally low false peptide identification rate of 0.29%.

No MeSH data available.


Related in: MedlinePlus