Limits...
Network-assisted protein identification and data interpretation in shotgun proteomics.

Li J, Zimmerman LJ, Park BH, Tabb DL, Liebler DC, Zhang B - Mol. Syst. Biol. (2009)

Bottom Line: In several data sets tested, CEA increased protein identification by 8-23% with an estimated accuracy of 85%.Rescued proteins were supported by existing literature or transcriptome profiling studies at similar levels as confident proteins and at a significantly higher level than abandoned ones.In addition, CEA generated a network view of the proteins and helped show the modular organization of proteins that may underpin the molecular mechanisms of the disease.

View Article: PubMed Central - PubMed

Affiliation: Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37232-8340, USA.

ABSTRACT
Protein assembly and biological interpretation of the assembled protein lists are critical steps in shotgun proteomics data analysis. Although most biological functions arise from interactions among proteins, current protein assembly pipelines treat proteins as independent entities. Usually, only individual proteins with strong experimental evidence, that is, confident proteins, are reported, whereas many possible proteins of biological interest are eliminated. We have developed a clique-enrichment approach (CEA) to rescue eliminated proteins by incorporating the relationship among proteins as embedded in a protein interaction network. In several data sets tested, CEA increased protein identification by 8-23% with an estimated accuracy of 85%. Rescued proteins were supported by existing literature or transcriptome profiling studies at similar levels as confident proteins and at a significantly higher level than abandoned ones. Applying CEA on a breast cancer data set, rescued proteins coded by well-known breast cancer genes. In addition, CEA generated a network view of the proteins and helped show the modular organization of proteins that may underpin the molecular mechanisms of the disease.

Show MeSH

Related in: MedlinePlus

Evaluation of the rescued proteins using relevant gene expression data and publications. (A–C) For the proteins rescued by the clique-enrichment approach (CEA) in mouse brain, placenta, and lung, relevant data sets in microarray (M), EST library studies (E), and publications in PubMed (P) were investigated for supporting evidence. Red, orange, yellow, and white correspond to support from three, two, one, or zero information resources, respectively. (D) Percentage of proteins in all annotated mouse proteins, confident proteins, rescued non-confident proteins, and un-rescued non-confident proteins that are supported by, at least, two information resources in brain, placenta, and lung, respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2736651&req=5

f2: Evaluation of the rescued proteins using relevant gene expression data and publications. (A–C) For the proteins rescued by the clique-enrichment approach (CEA) in mouse brain, placenta, and lung, relevant data sets in microarray (M), EST library studies (E), and publications in PubMed (P) were investigated for supporting evidence. Red, orange, yellow, and white correspond to support from three, two, one, or zero information resources, respectively. (D) Percentage of proteins in all annotated mouse proteins, confident proteins, rescued non-confident proteins, and un-rescued non-confident proteins that are supported by, at least, two information resources in brain, placenta, and lung, respectively.

Mentions: To provide further assessment of the reliability of the rescued proteins, we evaluated the rescued proteins in different organs using relevant data in microarray and EST library studies, as well as through publications indexed in PubMed. Figures 2A–C illustrate the percentage of rescued proteins with different levels of support from the three information resources in the brain, placenta, and lung, respectively. On average, 66% of the rescued proteins were supported by microarray data, 78% were supported by the EST libraries, and 77% were presented in publications on corresponding organs. If we combine different information sources, 49% of the rescued proteins were supported by all of the three information resources, 77% were supported by at least two resources, and 94% were supported by at least one resource.


Network-assisted protein identification and data interpretation in shotgun proteomics.

Li J, Zimmerman LJ, Park BH, Tabb DL, Liebler DC, Zhang B - Mol. Syst. Biol. (2009)

Evaluation of the rescued proteins using relevant gene expression data and publications. (A–C) For the proteins rescued by the clique-enrichment approach (CEA) in mouse brain, placenta, and lung, relevant data sets in microarray (M), EST library studies (E), and publications in PubMed (P) were investigated for supporting evidence. Red, orange, yellow, and white correspond to support from three, two, one, or zero information resources, respectively. (D) Percentage of proteins in all annotated mouse proteins, confident proteins, rescued non-confident proteins, and un-rescued non-confident proteins that are supported by, at least, two information resources in brain, placenta, and lung, respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2736651&req=5

f2: Evaluation of the rescued proteins using relevant gene expression data and publications. (A–C) For the proteins rescued by the clique-enrichment approach (CEA) in mouse brain, placenta, and lung, relevant data sets in microarray (M), EST library studies (E), and publications in PubMed (P) were investigated for supporting evidence. Red, orange, yellow, and white correspond to support from three, two, one, or zero information resources, respectively. (D) Percentage of proteins in all annotated mouse proteins, confident proteins, rescued non-confident proteins, and un-rescued non-confident proteins that are supported by, at least, two information resources in brain, placenta, and lung, respectively.
Mentions: To provide further assessment of the reliability of the rescued proteins, we evaluated the rescued proteins in different organs using relevant data in microarray and EST library studies, as well as through publications indexed in PubMed. Figures 2A–C illustrate the percentage of rescued proteins with different levels of support from the three information resources in the brain, placenta, and lung, respectively. On average, 66% of the rescued proteins were supported by microarray data, 78% were supported by the EST libraries, and 77% were presented in publications on corresponding organs. If we combine different information sources, 49% of the rescued proteins were supported by all of the three information resources, 77% were supported by at least two resources, and 94% were supported by at least one resource.

Bottom Line: In several data sets tested, CEA increased protein identification by 8-23% with an estimated accuracy of 85%.Rescued proteins were supported by existing literature or transcriptome profiling studies at similar levels as confident proteins and at a significantly higher level than abandoned ones.In addition, CEA generated a network view of the proteins and helped show the modular organization of proteins that may underpin the molecular mechanisms of the disease.

View Article: PubMed Central - PubMed

Affiliation: Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37232-8340, USA.

ABSTRACT
Protein assembly and biological interpretation of the assembled protein lists are critical steps in shotgun proteomics data analysis. Although most biological functions arise from interactions among proteins, current protein assembly pipelines treat proteins as independent entities. Usually, only individual proteins with strong experimental evidence, that is, confident proteins, are reported, whereas many possible proteins of biological interest are eliminated. We have developed a clique-enrichment approach (CEA) to rescue eliminated proteins by incorporating the relationship among proteins as embedded in a protein interaction network. In several data sets tested, CEA increased protein identification by 8-23% with an estimated accuracy of 85%. Rescued proteins were supported by existing literature or transcriptome profiling studies at similar levels as confident proteins and at a significantly higher level than abandoned ones. Applying CEA on a breast cancer data set, rescued proteins coded by well-known breast cancer genes. In addition, CEA generated a network view of the proteins and helped show the modular organization of proteins that may underpin the molecular mechanisms of the disease.

Show MeSH
Related in: MedlinePlus