Limits...
A draft map of the human proteome.

Kim MS, Pinto SM, Getnet D, Nirujogi RS, Manda SS, Chaerkady R, Madugundu AK, Kelkar DS, Isserlin R, Jain S, Thomas JK, Muthusamy B, Leal-Rojas P, Kumar P, Sahasrabuddhe NA, Balakrishnan L, Advani J, George B, Renuse S, Selvan LD, Patil AH, Nanjappa V, Radhakrishnan A, Prasad S, Subbannayya T, Raju R, Kumar M, Sreenivasamurthy SK, Marimuthu A, Sathe GJ, Chavan S, Datta KK, Subbannayya Y, Sahu A, Yelamanchi SD, Jayaram S, Rajagopalan P, Sharma J, Murthy KR, Syed N, Goel R, Khan AA, Ahmad S, Dey G, Mudgal K, Chatterjee A, Huang TC, Zhong J, Wu X, Shaw PG, Freed D, Zahari MS, Mukherjee KK, Shankar S, Mahadevan A, Lam H, Mitchell CJ, Shankar SK, Satishchandra P, Schroeder JT, Sirdeshmukh R, Maitra A, Leach SD, Drake CG, Halushka MK, Prasad TS, Hruban RH, Kerr CL, Bader GD, Iacobuzio-Donahue CA, Gowda H, Pandey A - Nature (2014)

Bottom Line: However, an equivalent map for the human proteome with direct measurements of proteins and peptides does not exist yet.In-depth proteomic profiling of 30 histologically normal human samples, including 17 adult tissues, 7 fetal tissues and 6 purified primary haematopoietic cells, resulted in identification of proteins encoded by 17,294 genes accounting for approximately 84% of the total annotated protein-coding genes in humans.A unique and comprehensive strategy for proteogenomic analysis enabled us to discover a number of novel protein-coding regions, which includes translated pseudogenes, non-coding RNAs and upstream open reading frames.

View Article: PubMed Central - PubMed

Affiliation: 1] McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA [2] Department of Biological Chemistry, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA.

ABSTRACT
The availability of human genome sequence has transformed biomedical research over the past decade. However, an equivalent map for the human proteome with direct measurements of proteins and peptides does not exist yet. Here we present a draft map of the human proteome using high-resolution Fourier-transform mass spectrometry. In-depth proteomic profiling of 30 histologically normal human samples, including 17 adult tissues, 7 fetal tissues and 6 purified primary haematopoietic cells, resulted in identification of proteins encoded by 17,294 genes accounting for approximately 84% of the total annotated protein-coding genes in humans. A unique and comprehensive strategy for proteogenomic analysis enabled us to discover a number of novel protein-coding regions, which includes translated pseudogenes, non-coding RNAs and upstream open reading frames. This large human proteome catalogue (available as an interactive web-based resource at http://www.humanproteomemap.org) will complement available human genome and transcriptome data to accelerate biomedical research in health and disease.

Show MeSH

Related in: MedlinePlus

Tissue-wise gene expression and housekeeping proteinsa, A heat map shows a partial list of not well-characterized, LOC genes. b, The bulk of protein mass is contributed by only a small number of genes. Only 2,350 ‘housekeeping genes’ account for ∼75% of proteome mass. c, The number of cell/tissue types where a gene was observed was counted. Some genes were found to be specifically restricted in a few samples while others were observed in the majority of samples analyzed. For example, 1,537 genes were detected only in one sample, and 2,350 genes were found in all samples. These later list of genes can be defined as highly abundant ‘housekeeping proteins.’ d, Distribution of genes in the RefSeq database based on the number of protein isoforms resulting from their annotated transcripts (left). Distribution of the transcripts with two or more protein isoforms annotated based on the number of isoform-specific or shared peptides (right). e, A representative example of sequence coverage of PSMB8 protein along with tissue distribution of all of its identified peptides and the MS/MS spectrum of one of the peptides is shown along with seven SRM transitions.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4403737&req=5

Figure 7: Tissue-wise gene expression and housekeeping proteinsa, A heat map shows a partial list of not well-characterized, LOC genes. b, The bulk of protein mass is contributed by only a small number of genes. Only 2,350 ‘housekeeping genes’ account for ∼75% of proteome mass. c, The number of cell/tissue types where a gene was observed was counted. Some genes were found to be specifically restricted in a few samples while others were observed in the majority of samples analyzed. For example, 1,537 genes were detected only in one sample, and 2,350 genes were found in all samples. These later list of genes can be defined as highly abundant ‘housekeeping proteins.’ d, Distribution of genes in the RefSeq database based on the number of protein isoforms resulting from their annotated transcripts (left). Distribution of the transcripts with two or more protein isoforms annotated based on the number of isoform-specific or shared peptides (right). e, A representative example of sequence coverage of PSMB8 protein along with tissue distribution of all of its identified peptides and the MS/MS spectrum of one of the peptides is shown along with seven SRM transitions.

Mentions: We compared our dataset with two of the largest human peptide-based resources – PeptideAtlas and GPMDB. These two databases contain curated peptide information that has been collected from the entire proteomics community over the last decade. Strikingly, almost half of the peptides we identified were not deposited in either one of these resources. Also, the novel peptides in our dataset constitute 37% of the peptides in PeptideAtlas and 54% of peptides in the case of GPMDB (Extended Data Fig. 1g, h). This dramatic increase in the coverage of human proteomic data was made possible by the breadth and depth of our analysis as most of the cells and tissues that we have analyzed have not previously been studied using similar methods. The depth of our analysis enabled us to identify protein products derived from two-thirds (2,535 out of 3,844) of proteins designated as ‘missing proteins’19 for lack of protein-based evidence. Several hypothetical proteins that we identified have a broad tissue distribution indicating the inadequate sampling of the human proteome thus far (Extended Data Fig. 2a).


A draft map of the human proteome.

Kim MS, Pinto SM, Getnet D, Nirujogi RS, Manda SS, Chaerkady R, Madugundu AK, Kelkar DS, Isserlin R, Jain S, Thomas JK, Muthusamy B, Leal-Rojas P, Kumar P, Sahasrabuddhe NA, Balakrishnan L, Advani J, George B, Renuse S, Selvan LD, Patil AH, Nanjappa V, Radhakrishnan A, Prasad S, Subbannayya T, Raju R, Kumar M, Sreenivasamurthy SK, Marimuthu A, Sathe GJ, Chavan S, Datta KK, Subbannayya Y, Sahu A, Yelamanchi SD, Jayaram S, Rajagopalan P, Sharma J, Murthy KR, Syed N, Goel R, Khan AA, Ahmad S, Dey G, Mudgal K, Chatterjee A, Huang TC, Zhong J, Wu X, Shaw PG, Freed D, Zahari MS, Mukherjee KK, Shankar S, Mahadevan A, Lam H, Mitchell CJ, Shankar SK, Satishchandra P, Schroeder JT, Sirdeshmukh R, Maitra A, Leach SD, Drake CG, Halushka MK, Prasad TS, Hruban RH, Kerr CL, Bader GD, Iacobuzio-Donahue CA, Gowda H, Pandey A - Nature (2014)

Tissue-wise gene expression and housekeeping proteinsa, A heat map shows a partial list of not well-characterized, LOC genes. b, The bulk of protein mass is contributed by only a small number of genes. Only 2,350 ‘housekeeping genes’ account for ∼75% of proteome mass. c, The number of cell/tissue types where a gene was observed was counted. Some genes were found to be specifically restricted in a few samples while others were observed in the majority of samples analyzed. For example, 1,537 genes were detected only in one sample, and 2,350 genes were found in all samples. These later list of genes can be defined as highly abundant ‘housekeeping proteins.’ d, Distribution of genes in the RefSeq database based on the number of protein isoforms resulting from their annotated transcripts (left). Distribution of the transcripts with two or more protein isoforms annotated based on the number of isoform-specific or shared peptides (right). e, A representative example of sequence coverage of PSMB8 protein along with tissue distribution of all of its identified peptides and the MS/MS spectrum of one of the peptides is shown along with seven SRM transitions.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4403737&req=5

Figure 7: Tissue-wise gene expression and housekeeping proteinsa, A heat map shows a partial list of not well-characterized, LOC genes. b, The bulk of protein mass is contributed by only a small number of genes. Only 2,350 ‘housekeeping genes’ account for ∼75% of proteome mass. c, The number of cell/tissue types where a gene was observed was counted. Some genes were found to be specifically restricted in a few samples while others were observed in the majority of samples analyzed. For example, 1,537 genes were detected only in one sample, and 2,350 genes were found in all samples. These later list of genes can be defined as highly abundant ‘housekeeping proteins.’ d, Distribution of genes in the RefSeq database based on the number of protein isoforms resulting from their annotated transcripts (left). Distribution of the transcripts with two or more protein isoforms annotated based on the number of isoform-specific or shared peptides (right). e, A representative example of sequence coverage of PSMB8 protein along with tissue distribution of all of its identified peptides and the MS/MS spectrum of one of the peptides is shown along with seven SRM transitions.
Mentions: We compared our dataset with two of the largest human peptide-based resources – PeptideAtlas and GPMDB. These two databases contain curated peptide information that has been collected from the entire proteomics community over the last decade. Strikingly, almost half of the peptides we identified were not deposited in either one of these resources. Also, the novel peptides in our dataset constitute 37% of the peptides in PeptideAtlas and 54% of peptides in the case of GPMDB (Extended Data Fig. 1g, h). This dramatic increase in the coverage of human proteomic data was made possible by the breadth and depth of our analysis as most of the cells and tissues that we have analyzed have not previously been studied using similar methods. The depth of our analysis enabled us to identify protein products derived from two-thirds (2,535 out of 3,844) of proteins designated as ‘missing proteins’19 for lack of protein-based evidence. Several hypothetical proteins that we identified have a broad tissue distribution indicating the inadequate sampling of the human proteome thus far (Extended Data Fig. 2a).

Bottom Line: However, an equivalent map for the human proteome with direct measurements of proteins and peptides does not exist yet.In-depth proteomic profiling of 30 histologically normal human samples, including 17 adult tissues, 7 fetal tissues and 6 purified primary haematopoietic cells, resulted in identification of proteins encoded by 17,294 genes accounting for approximately 84% of the total annotated protein-coding genes in humans.A unique and comprehensive strategy for proteogenomic analysis enabled us to discover a number of novel protein-coding regions, which includes translated pseudogenes, non-coding RNAs and upstream open reading frames.

View Article: PubMed Central - PubMed

Affiliation: 1] McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA [2] Department of Biological Chemistry, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA.

ABSTRACT
The availability of human genome sequence has transformed biomedical research over the past decade. However, an equivalent map for the human proteome with direct measurements of proteins and peptides does not exist yet. Here we present a draft map of the human proteome using high-resolution Fourier-transform mass spectrometry. In-depth proteomic profiling of 30 histologically normal human samples, including 17 adult tissues, 7 fetal tissues and 6 purified primary haematopoietic cells, resulted in identification of proteins encoded by 17,294 genes accounting for approximately 84% of the total annotated protein-coding genes in humans. A unique and comprehensive strategy for proteogenomic analysis enabled us to discover a number of novel protein-coding regions, which includes translated pseudogenes, non-coding RNAs and upstream open reading frames. This large human proteome catalogue (available as an interactive web-based resource at http://www.humanproteomemap.org) will complement available human genome and transcriptome data to accelerate biomedical research in health and disease.

Show MeSH
Related in: MedlinePlus