Limits...
P2P proteomics -- data sharing for enhanced protein identification.

Schorlemmer M, Abián J, Sierra C, de la Cruz D, Bernacchioni L, Jaén E, Perreau de Pinninck A, Atencia M - Autom Exp (2012)

Bottom Line: A detailed analysis of the results indicated the presence of a protein that was supported by other NCBI matches and highly scored matches in several proteomics labs.This fact is evident from the information that could be derived from the proposed P2P proteomics system, however it is not straightforward to arrive to the same conclusion by conventional means as it is difficult to discard organic contamination of samples.The actual presence of this contaminant was only stated after the ABRF study of all the identifications reported by the laboratories.

View Article: PubMed Central - HTML - PubMed

Affiliation: Artificial Intelligence Research Institute, IIIA-CSIC, Spain. marco@iiia.csic.es.

ABSTRACT

Background: In order to tackle the important and challenging problem in proteomics of identifying known and new protein sequences using high-throughput methods, we propose a data-sharing platform that uses fully distributed P2P technologies to share specifications of peer-interaction protocols and service components. By using such a platform, information to be searched is no longer centralised in a few repositories but gathered from experiments in peer proteomics laboratories, which can subsequently be searched by fellow researchers.

Methods: The system distributively runs a data-sharing protocol specified in the Lightweight Communication Calculus underlying the system through which researchers interact via message passing. For this, researchers interact with the system through particular components that link to database querying systems based on BLAST and/or OMSSA and GUI-based visualisation environments. We have tested the proposed platform with data drawn from preexisting MS/MS data reservoirs from the 2006 ABRF (Association of Biomolecular Resource Facilities) test sample, which was extensively tested during the ABRF Proteomics Standards Research Group 2006 worldwide survey. In particular we have taken the data available from a subset of proteomics laboratories of Spain's National Institute for Proteomics, ProteoRed, a network for the coordination, integration and development of the Spanish proteomics facilities.

Results and discussion: We performed queries against nine databases including seven ProteoRed proteomics laboratories, the NCBI Swiss-Prot database and the local database of the CSIC/UAB Proteomics Laboratory. A detailed analysis of the results indicated the presence of a protein that was supported by other NCBI matches and highly scored matches in several proteomics labs. The analysis clearly indicated that the protein was a relatively high concentrated contaminant that could be present in the ABRF sample. This fact is evident from the information that could be derived from the proposed P2P proteomics system, however it is not straightforward to arrive to the same conclusion by conventional means as it is difficult to discard organic contamination of samples. The actual presence of this contaminant was only stated after the ABRF study of all the identifications reported by the laboratories.

No MeSH data available.


Query window and BLAST search parameters used for this study. The sequences shown in the image correspond to the first group of queries.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3298698&req=5

Figure 15: Query window and BLAST search parameters used for this study. The sequences shown in the image correspond to the first group of queries.

Mentions: Each group was queried with the same parameters (Figure 15) and the results analysed in the researcher OKC prospector window (Figure 16). As expected, the search in the researchers database (column labelled with uab) generated always full coincidences. Contrarily, other proteomics labs and the NCBI Swiss-Prot database (labelled with ncbi) produced more diverse results. Most of the queries produced high percentage identity values in the ncbi search. These hits give direct information about the identity of the peptide and the source protein ('id' and'des' text windows in Figure 17). One of the queries in Figure 17 (Query 10) produced a 100% coincidence in the NCBI Swiss-Prot database. The expectation values for this match indicated that it was not due to hazard. The protein that had been tentatively identified, P20160 (azurocidin precursor) was, however, not included in the list of component of the standard ABRF sample.


P2P proteomics -- data sharing for enhanced protein identification.

Schorlemmer M, Abián J, Sierra C, de la Cruz D, Bernacchioni L, Jaén E, Perreau de Pinninck A, Atencia M - Autom Exp (2012)

Query window and BLAST search parameters used for this study. The sequences shown in the image correspond to the first group of queries.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3298698&req=5

Figure 15: Query window and BLAST search parameters used for this study. The sequences shown in the image correspond to the first group of queries.
Mentions: Each group was queried with the same parameters (Figure 15) and the results analysed in the researcher OKC prospector window (Figure 16). As expected, the search in the researchers database (column labelled with uab) generated always full coincidences. Contrarily, other proteomics labs and the NCBI Swiss-Prot database (labelled with ncbi) produced more diverse results. Most of the queries produced high percentage identity values in the ncbi search. These hits give direct information about the identity of the peptide and the source protein ('id' and'des' text windows in Figure 17). One of the queries in Figure 17 (Query 10) produced a 100% coincidence in the NCBI Swiss-Prot database. The expectation values for this match indicated that it was not due to hazard. The protein that had been tentatively identified, P20160 (azurocidin precursor) was, however, not included in the list of component of the standard ABRF sample.

Bottom Line: A detailed analysis of the results indicated the presence of a protein that was supported by other NCBI matches and highly scored matches in several proteomics labs.This fact is evident from the information that could be derived from the proposed P2P proteomics system, however it is not straightforward to arrive to the same conclusion by conventional means as it is difficult to discard organic contamination of samples.The actual presence of this contaminant was only stated after the ABRF study of all the identifications reported by the laboratories.

View Article: PubMed Central - HTML - PubMed

Affiliation: Artificial Intelligence Research Institute, IIIA-CSIC, Spain. marco@iiia.csic.es.

ABSTRACT

Background: In order to tackle the important and challenging problem in proteomics of identifying known and new protein sequences using high-throughput methods, we propose a data-sharing platform that uses fully distributed P2P technologies to share specifications of peer-interaction protocols and service components. By using such a platform, information to be searched is no longer centralised in a few repositories but gathered from experiments in peer proteomics laboratories, which can subsequently be searched by fellow researchers.

Methods: The system distributively runs a data-sharing protocol specified in the Lightweight Communication Calculus underlying the system through which researchers interact via message passing. For this, researchers interact with the system through particular components that link to database querying systems based on BLAST and/or OMSSA and GUI-based visualisation environments. We have tested the proposed platform with data drawn from preexisting MS/MS data reservoirs from the 2006 ABRF (Association of Biomolecular Resource Facilities) test sample, which was extensively tested during the ABRF Proteomics Standards Research Group 2006 worldwide survey. In particular we have taken the data available from a subset of proteomics laboratories of Spain's National Institute for Proteomics, ProteoRed, a network for the coordination, integration and development of the Spanish proteomics facilities.

Results and discussion: We performed queries against nine databases including seven ProteoRed proteomics laboratories, the NCBI Swiss-Prot database and the local database of the CSIC/UAB Proteomics Laboratory. A detailed analysis of the results indicated the presence of a protein that was supported by other NCBI matches and highly scored matches in several proteomics labs. The analysis clearly indicated that the protein was a relatively high concentrated contaminant that could be present in the ABRF sample. This fact is evident from the information that could be derived from the proposed P2P proteomics system, however it is not straightforward to arrive to the same conclusion by conventional means as it is difficult to discard organic contamination of samples. The actual presence of this contaminant was only stated after the ABRF study of all the identifications reported by the laboratories.

No MeSH data available.