Limits...
GRIP: A web-based system for constructing Gold Standard datasets for protein-protein interaction prediction.

Browne F, Wang H, Zheng H, Azuaje F - Source Code Biol Med (2009)

Bottom Line: Extraction of these data can be both complex and time consuming.A search facility is provided where a user can search for protein complex information in Saccharomyces cerevisiae.Manual construction of reference datasets can be a time consuming process requiring programming knowledge.

View Article: PubMed Central - HTML - PubMed

Affiliation: School of Computing and Mathematics, University of Ulster at Jordanstown, Northern Ireland, UK. hy.wang@ulster.ac.uk.

ABSTRACT

Background: Information about protein interaction networks is fundamental to understanding protein function and cellular processes. Interaction patterns among proteins can suggest new drug targets and aid in the design of new therapeutic interventions. Efforts have been made to map interactions on a proteomic-wide scale using both experimental and computational techniques. Reference datasets that contain known interacting proteins (positive cases) and non-interacting proteins (negative cases) are essential to support computational prediction and validation of protein-protein interactions. Information on known interacting and non interacting proteins are usually stored within databases. Extraction of these data can be both complex and time consuming. Although, the automatic construction of reference datasets for classification is a useful resource for researchers no public resource currently exists to perform this task.

Results: GRIP (Gold Reference dataset constructor from Information on Protein complexes) is a web-based system that provides researchers with the functionality to create reference datasets for protein-protein interaction prediction in Saccharomyces cerevisiae. Both positive and negative cases for a reference dataset can be extracted, organised and downloaded by the user. GRIP also provides an upload facility whereby users can submit proteins to determine protein complex membership. A search facility is provided where a user can search for protein complex information in Saccharomyces cerevisiae.

Conclusion: GRIP is developed to retrieve information on protein complex, cellular localisation, and physical and genetic interactions in Saccharomyces cerevisiae. Manual construction of reference datasets can be a time consuming process requiring programming knowledge. GRIP simplifies and speeds up this process by allowing users to automatically construct reference datasets. GRIP is free to access at http://rosalind.infj.ulst.ac.uk/GRIP/.

No MeSH data available.


Related in: MedlinePlus

Graphical overview of the steps taken by GRIP to produce a reference dataset.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2636808&req=5

Figure 1: Graphical overview of the steps taken by GRIP to produce a reference dataset.

Mentions: Both positive and negative reference datasets are constructed by GRIP. The reference datasets are constructed using data obtained from two sources: the MIPS Comprehensive Yeast Genome Database (CYGD) – Complex Catalogue and Localisation Catalogue [6] and the BioGRID repository [8]. Using the biological assumption that proteins found in the same protein complex are more likely to interact, positive cases are extracted from complex data obtained from MIPS complex catalogue. Positive cases are also attained from validated genetic and physical interactions acquired from the BioGRID repository [8]. Defining a 'negative' case is not a trivial task, some researchers may implement different criteria such as random selection of proteins [9] or generating proteins whose sub-cellular location is different [2,3]. By selecting proteins from different sub-cellular locations researchers suggest proteins are less likely to be interacting [2,3] generating high quality non-interactions. Other studies implement a simpler scheme, selecting proteins at random from a set of proteins [9]. Using protein complex and localisation data from the MIPS complex catalogue both these criteria are implemented for the construction of negative reference datasets. Using the BioGRID repository as a data source, a random selection of protein pairs are used to construction the negative reference dataset. The MIPS complex catalogue was selected as a source for Gold Standard construction as it contains lists of known protein complexes based on data collected from validated, small-scale studies obtained from the biomedical literature. Complex data labelled "Complexes by Systematic Analysis" were excluded from this study as they have not been manually verified. Locations labelled "other sub-cellular localisation", "ambiguous" and "integral membrane/endomembranes" were excluded as these data are generated using high-throughput analysis or these location labels are not adequately specific for inclusion in GRIP. In addition the locations "extracellular", "cell wall", "cell periphery" and "plasma membrane" are grouped together into one location. The BioGRID repository was selected as a source as the data is obtained from validated genetic and physical interactions extracted from literature [8]. A user can select either the MIPS or BioGRID as the data source to construct a reference dataset. The reference dataset offered to the user consists of a number of cases. Using the MIPS as the data source, each 'positive' case contains the protein complex name and a list of proteins belonging to that complex (Figure 1). GRIP defines a positive case by only considering proteins in the same subclass (at the lowest level). A 'positive' case when selecting the BioGRID as the data source will contain a pair of proteins which have been validated as having a physical or genetic interaction. Different criteria have been suggested for defining a 'negative' case [2,3,9]. GRIP offers two generation criteria for constructing 'negative' cases for the MIPS data source. Firstly a negative case can be defined as consisting of a list of proteins obtained from different sub-cellular locations and complexes. Secondly a negative case can be defined as consisting of a list of proteins randomly selected from sub-cellular locations and complexes. GRIP provides users with the flexibility to determine the number of proteins in a given case. If a user stipulates that a case should consist of, for instance, two proteins, GRIP will retrieve two proteins that are found within the same complex (complexes considered have a minimum size of 5 proteins). For the BioGRID data source a 'negative' case is defined by the random selection of protein pairs. The user has the freedom to determine the total number of cases in the reference dataset.


GRIP: A web-based system for constructing Gold Standard datasets for protein-protein interaction prediction.

Browne F, Wang H, Zheng H, Azuaje F - Source Code Biol Med (2009)

Graphical overview of the steps taken by GRIP to produce a reference dataset.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2636808&req=5

Figure 1: Graphical overview of the steps taken by GRIP to produce a reference dataset.
Mentions: Both positive and negative reference datasets are constructed by GRIP. The reference datasets are constructed using data obtained from two sources: the MIPS Comprehensive Yeast Genome Database (CYGD) – Complex Catalogue and Localisation Catalogue [6] and the BioGRID repository [8]. Using the biological assumption that proteins found in the same protein complex are more likely to interact, positive cases are extracted from complex data obtained from MIPS complex catalogue. Positive cases are also attained from validated genetic and physical interactions acquired from the BioGRID repository [8]. Defining a 'negative' case is not a trivial task, some researchers may implement different criteria such as random selection of proteins [9] or generating proteins whose sub-cellular location is different [2,3]. By selecting proteins from different sub-cellular locations researchers suggest proteins are less likely to be interacting [2,3] generating high quality non-interactions. Other studies implement a simpler scheme, selecting proteins at random from a set of proteins [9]. Using protein complex and localisation data from the MIPS complex catalogue both these criteria are implemented for the construction of negative reference datasets. Using the BioGRID repository as a data source, a random selection of protein pairs are used to construction the negative reference dataset. The MIPS complex catalogue was selected as a source for Gold Standard construction as it contains lists of known protein complexes based on data collected from validated, small-scale studies obtained from the biomedical literature. Complex data labelled "Complexes by Systematic Analysis" were excluded from this study as they have not been manually verified. Locations labelled "other sub-cellular localisation", "ambiguous" and "integral membrane/endomembranes" were excluded as these data are generated using high-throughput analysis or these location labels are not adequately specific for inclusion in GRIP. In addition the locations "extracellular", "cell wall", "cell periphery" and "plasma membrane" are grouped together into one location. The BioGRID repository was selected as a source as the data is obtained from validated genetic and physical interactions extracted from literature [8]. A user can select either the MIPS or BioGRID as the data source to construct a reference dataset. The reference dataset offered to the user consists of a number of cases. Using the MIPS as the data source, each 'positive' case contains the protein complex name and a list of proteins belonging to that complex (Figure 1). GRIP defines a positive case by only considering proteins in the same subclass (at the lowest level). A 'positive' case when selecting the BioGRID as the data source will contain a pair of proteins which have been validated as having a physical or genetic interaction. Different criteria have been suggested for defining a 'negative' case [2,3,9]. GRIP offers two generation criteria for constructing 'negative' cases for the MIPS data source. Firstly a negative case can be defined as consisting of a list of proteins obtained from different sub-cellular locations and complexes. Secondly a negative case can be defined as consisting of a list of proteins randomly selected from sub-cellular locations and complexes. GRIP provides users with the flexibility to determine the number of proteins in a given case. If a user stipulates that a case should consist of, for instance, two proteins, GRIP will retrieve two proteins that are found within the same complex (complexes considered have a minimum size of 5 proteins). For the BioGRID data source a 'negative' case is defined by the random selection of protein pairs. The user has the freedom to determine the total number of cases in the reference dataset.

Bottom Line: Extraction of these data can be both complex and time consuming.A search facility is provided where a user can search for protein complex information in Saccharomyces cerevisiae.Manual construction of reference datasets can be a time consuming process requiring programming knowledge.

View Article: PubMed Central - HTML - PubMed

Affiliation: School of Computing and Mathematics, University of Ulster at Jordanstown, Northern Ireland, UK. hy.wang@ulster.ac.uk.

ABSTRACT

Background: Information about protein interaction networks is fundamental to understanding protein function and cellular processes. Interaction patterns among proteins can suggest new drug targets and aid in the design of new therapeutic interventions. Efforts have been made to map interactions on a proteomic-wide scale using both experimental and computational techniques. Reference datasets that contain known interacting proteins (positive cases) and non-interacting proteins (negative cases) are essential to support computational prediction and validation of protein-protein interactions. Information on known interacting and non interacting proteins are usually stored within databases. Extraction of these data can be both complex and time consuming. Although, the automatic construction of reference datasets for classification is a useful resource for researchers no public resource currently exists to perform this task.

Results: GRIP (Gold Reference dataset constructor from Information on Protein complexes) is a web-based system that provides researchers with the functionality to create reference datasets for protein-protein interaction prediction in Saccharomyces cerevisiae. Both positive and negative cases for a reference dataset can be extracted, organised and downloaded by the user. GRIP also provides an upload facility whereby users can submit proteins to determine protein complex membership. A search facility is provided where a user can search for protein complex information in Saccharomyces cerevisiae.

Conclusion: GRIP is developed to retrieve information on protein complex, cellular localisation, and physical and genetic interactions in Saccharomyces cerevisiae. Manual construction of reference datasets can be a time consuming process requiring programming knowledge. GRIP simplifies and speeds up this process by allowing users to automatically construct reference datasets. GRIP is free to access at http://rosalind.infj.ulst.ac.uk/GRIP/.

No MeSH data available.


Related in: MedlinePlus