Limits...
PreBIND and Textomy--mining the biomedical literature for protein-protein interactions using a support vector machine.

Donaldson I, Martin J, de Bruijn B, Wolting C, Lay V, Tuekam B, Zhang S, Baskin B, Bader GD, Michalickova K, Pawson T, Hogue CW - BMC Bioinformatics (2003)

Bottom Line: We hypothesized that the formidable task-size of backfilling the database could be reduced by using Support Vector Machine technology to first locate interaction information in the literature.We estimated that the system would be able to recall up to 60% of all non-high throughput interactions present in another yeast-protein interaction database.Finally, this system was applied to a real-world curation problem and its use was found to reduce the task duration by 70% thus saving 176 days.

View Article: PubMed Central - HTML - PubMed

Affiliation: Samuel Lunenfeld Research Institute, Toronto, M5G 1X5, Canada. ian.donaldson@utoronto.ca

ABSTRACT

Background: The majority of experimentally verified molecular interaction and biological pathway data are present in the unstructured text of biomedical journal articles where they are inaccessible to computational methods. The Biomolecular interaction network database (BIND) seeks to capture these data in a machine-readable format. We hypothesized that the formidable task-size of backfilling the database could be reduced by using Support Vector Machine technology to first locate interaction information in the literature. We present an information extraction system that was designed to locate protein-protein interaction data in the literature and present these data to curators and the public for review and entry into BIND.

Results: Cross-validation estimated the support vector machine's test-set precision, accuracy and recall for classifying abstracts describing interaction information was 92%, 90% and 92% respectively. We estimated that the system would be able to recall up to 60% of all non-high throughput interactions present in another yeast-protein interaction database. Finally, this system was applied to a real-world curation problem and its use was found to reduce the task duration by 70% thus saving 176 days.

Conclusions: Machine learning methods are useful as tools to direct interaction and pathway database back-filling; however, this potential can only be realized if these techniques are coupled with human review and entry into a factual database such as BIND. The PreBIND system described here is available to the public at http://bind.ca. Current capabilities allow searching for human, mouse and yeast protein-interaction information.

Show MeSH
The PreBIND interface. Users can freely view and submit feedback about all potential interactions present in PreBIND. Potential interactions that are judged by users to be legitimate may be submitted to the BIND database for review by curators. Information gathered in this way will be used to further train the support vector machine used in the initial search and help develop natural language analysis algorithms.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC153503&req=5

Figure 3: The PreBIND interface. Users can freely view and submit feedback about all potential interactions present in PreBIND. Potential interactions that are judged by users to be legitimate may be submitted to the BIND database for review by curators. Information gathered in this way will be used to further train the support vector machine used in the initial search and help develop natural language analysis algorithms.

Mentions: Once the user has finished reviewing the abstract and has confirmed the potential interactions mentioned in it, they could submit them to the BIND database (Fig. 2, item 13 and Fig. 3). A BIND record may be created using the PreBIND CGI and submitted to curators at BIND via the Web. The SeqHound database is consulted to ensure that molecule type, taxon and GI identifier are up-to-date (Fig 2.12 and [25]). A subsequent second review by BIND curators (Fig. 2, item 14) is required before the record is released to the public BIND database (Fig. 2, item 15) at .


PreBIND and Textomy--mining the biomedical literature for protein-protein interactions using a support vector machine.

Donaldson I, Martin J, de Bruijn B, Wolting C, Lay V, Tuekam B, Zhang S, Baskin B, Bader GD, Michalickova K, Pawson T, Hogue CW - BMC Bioinformatics (2003)

The PreBIND interface. Users can freely view and submit feedback about all potential interactions present in PreBIND. Potential interactions that are judged by users to be legitimate may be submitted to the BIND database for review by curators. Information gathered in this way will be used to further train the support vector machine used in the initial search and help develop natural language analysis algorithms.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC153503&req=5

Figure 3: The PreBIND interface. Users can freely view and submit feedback about all potential interactions present in PreBIND. Potential interactions that are judged by users to be legitimate may be submitted to the BIND database for review by curators. Information gathered in this way will be used to further train the support vector machine used in the initial search and help develop natural language analysis algorithms.
Mentions: Once the user has finished reviewing the abstract and has confirmed the potential interactions mentioned in it, they could submit them to the BIND database (Fig. 2, item 13 and Fig. 3). A BIND record may be created using the PreBIND CGI and submitted to curators at BIND via the Web. The SeqHound database is consulted to ensure that molecule type, taxon and GI identifier are up-to-date (Fig 2.12 and [25]). A subsequent second review by BIND curators (Fig. 2, item 14) is required before the record is released to the public BIND database (Fig. 2, item 15) at .

Bottom Line: We hypothesized that the formidable task-size of backfilling the database could be reduced by using Support Vector Machine technology to first locate interaction information in the literature.We estimated that the system would be able to recall up to 60% of all non-high throughput interactions present in another yeast-protein interaction database.Finally, this system was applied to a real-world curation problem and its use was found to reduce the task duration by 70% thus saving 176 days.

View Article: PubMed Central - HTML - PubMed

Affiliation: Samuel Lunenfeld Research Institute, Toronto, M5G 1X5, Canada. ian.donaldson@utoronto.ca

ABSTRACT

Background: The majority of experimentally verified molecular interaction and biological pathway data are present in the unstructured text of biomedical journal articles where they are inaccessible to computational methods. The Biomolecular interaction network database (BIND) seeks to capture these data in a machine-readable format. We hypothesized that the formidable task-size of backfilling the database could be reduced by using Support Vector Machine technology to first locate interaction information in the literature. We present an information extraction system that was designed to locate protein-protein interaction data in the literature and present these data to curators and the public for review and entry into BIND.

Results: Cross-validation estimated the support vector machine's test-set precision, accuracy and recall for classifying abstracts describing interaction information was 92%, 90% and 92% respectively. We estimated that the system would be able to recall up to 60% of all non-high throughput interactions present in another yeast-protein interaction database. Finally, this system was applied to a real-world curation problem and its use was found to reduce the task duration by 70% thus saving 176 days.

Conclusions: Machine learning methods are useful as tools to direct interaction and pathway database back-filling; however, this potential can only be realized if these techniques are coupled with human review and entry into a factual database such as BIND. The PreBIND system described here is available to the public at http://bind.ca. Current capabilities allow searching for human, mouse and yeast protein-interaction information.

Show MeSH