Limits...
Pathway analysis and transcriptomics improve protein identification by shotgun proteomics from samples comprising small number of cells--a benchmarking study.

Sun J, Zhang GL, Li S, Ivanov AR, Fenyo D, Lisacek F, Murthy SK, Karger BL, Brusic V - BMC Genomics (2014)

Bottom Line: These proteins were mapped to gene symbols and compared with gene sets representing canonical pathways, more than 4,000 genes were extracted from the enriched gene sets.The comparison of proteomics and transcriptomics data, suggests that highly expressed transcripts have high probability of protein expression.However, approximately 10% of expressed proteins could not be matched with the expressed transcripts.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Proteomics research is enabled with the high-throughput technologies, but our ability to identify expressed proteome is limited in small samples. The coverage and consistency of proteome expression are critical problems in proteomics. Here, we propose pathway analysis and combination of microproteomics and transcriptomics analyses to improve mass-spectrometry protein identification from small size samples.

Results: Multiple proteomics runs using MCF-7 cell line detected 4,957 expressed proteins. About 80% of expressed proteins were present in MCF-7 transcripts data; highly expressed transcripts are more likely to have expressed proteins. Approximately 1,000 proteins were detected in each run of the small sample proteomics. These proteins were mapped to gene symbols and compared with gene sets representing canonical pathways, more than 4,000 genes were extracted from the enriched gene sets. The identified canonical pathways were largely overlapping between individual runs. Of identified pathways 182 were shared between three individual small sample runs.

Conclusions: Current technologies enable us to directly detect 10% of expressed proteomes from small sample comprising as few as 50 cells. We used knowledge-based approaches to elucidate the missing proteome that can be verified by targeted proteomics. This knowledge-based approach includes pathway analysis and combination of gene expression and protein expression data for target prioritization. Genes present in both the enriched gene sets (canonical pathways collection) and in small sample proteomics data correspond to approximately 50% of expressed proteomes in larger sample proteomics data. In addition, 90% of targets from canonical pathways were estimated to be expressed. The comparison of proteomics and transcriptomics data, suggests that highly expressed transcripts have high probability of protein expression. However, approximately 10% of expressed proteins could not be matched with the expressed transcripts.

Show MeSH
Numbers of proteins identified in individual experiments and triplicate runs. The "intersection" stands for proteins detected in all three runs, "twice" stands for proteins identified in at least two of all three runs, while "union" stands for proteins detected in any one run of all three runs.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4290587&req=5

Figure 2: Numbers of proteins identified in individual experiments and triplicate runs. The "intersection" stands for proteins detected in all three runs, "twice" stands for proteins identified in at least two of all three runs, while "union" stands for proteins detected in any one run of all three runs.

Mentions: The detected protein names were mapped to the approved gene symbols according to HGNC nomenclature (referred to as the approved gene symbols in the following text) [39], resulting in 4,957 identified and annotated proteins. The numbers of proteins identified in individual runs are shown in Figure 2. Larger samples yielded larger numbers of identified proteins, except for runs with 5,000-cell samples. This decline is the artefact of saturation of specific plot column [41] that was used this set of proteomics runs.


Pathway analysis and transcriptomics improve protein identification by shotgun proteomics from samples comprising small number of cells--a benchmarking study.

Sun J, Zhang GL, Li S, Ivanov AR, Fenyo D, Lisacek F, Murthy SK, Karger BL, Brusic V - BMC Genomics (2014)

Numbers of proteins identified in individual experiments and triplicate runs. The "intersection" stands for proteins detected in all three runs, "twice" stands for proteins identified in at least two of all three runs, while "union" stands for proteins detected in any one run of all three runs.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4290587&req=5

Figure 2: Numbers of proteins identified in individual experiments and triplicate runs. The "intersection" stands for proteins detected in all three runs, "twice" stands for proteins identified in at least two of all three runs, while "union" stands for proteins detected in any one run of all three runs.
Mentions: The detected protein names were mapped to the approved gene symbols according to HGNC nomenclature (referred to as the approved gene symbols in the following text) [39], resulting in 4,957 identified and annotated proteins. The numbers of proteins identified in individual runs are shown in Figure 2. Larger samples yielded larger numbers of identified proteins, except for runs with 5,000-cell samples. This decline is the artefact of saturation of specific plot column [41] that was used this set of proteomics runs.

Bottom Line: These proteins were mapped to gene symbols and compared with gene sets representing canonical pathways, more than 4,000 genes were extracted from the enriched gene sets.The comparison of proteomics and transcriptomics data, suggests that highly expressed transcripts have high probability of protein expression.However, approximately 10% of expressed proteins could not be matched with the expressed transcripts.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Proteomics research is enabled with the high-throughput technologies, but our ability to identify expressed proteome is limited in small samples. The coverage and consistency of proteome expression are critical problems in proteomics. Here, we propose pathway analysis and combination of microproteomics and transcriptomics analyses to improve mass-spectrometry protein identification from small size samples.

Results: Multiple proteomics runs using MCF-7 cell line detected 4,957 expressed proteins. About 80% of expressed proteins were present in MCF-7 transcripts data; highly expressed transcripts are more likely to have expressed proteins. Approximately 1,000 proteins were detected in each run of the small sample proteomics. These proteins were mapped to gene symbols and compared with gene sets representing canonical pathways, more than 4,000 genes were extracted from the enriched gene sets. The identified canonical pathways were largely overlapping between individual runs. Of identified pathways 182 were shared between three individual small sample runs.

Conclusions: Current technologies enable us to directly detect 10% of expressed proteomes from small sample comprising as few as 50 cells. We used knowledge-based approaches to elucidate the missing proteome that can be verified by targeted proteomics. This knowledge-based approach includes pathway analysis and combination of gene expression and protein expression data for target prioritization. Genes present in both the enriched gene sets (canonical pathways collection) and in small sample proteomics data correspond to approximately 50% of expressed proteomes in larger sample proteomics data. In addition, 90% of targets from canonical pathways were estimated to be expressed. The comparison of proteomics and transcriptomics data, suggests that highly expressed transcripts have high probability of protein expression. However, approximately 10% of expressed proteins could not be matched with the expressed transcripts.

Show MeSH