Limits...
A text-mining system for extracting metabolic reactions from full-text articles.

Czarnecki J, Nobeli I, Smith AM, Shepherd AJ - BMC Bioinformatics (2012)

Bottom Line: Increasingly biological text mining research is focusing on the extraction of complex relationships relevant to the construction and curation of biological networks and pathways.Precision and recall rates are comparable to those previously achieved for the well-known protein-protein interaction extraction task.We conclude that automated metabolic pathway construction is more tractable than has often been assumed, and that (as in the case of protein-protein interaction extraction) relatively simple text-mining approaches can prove surprisingly effective.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biological Sciences and Institute of Molecular and Structural Biology, Birkbeck, University of London, London, UK.

ABSTRACT

Background: Increasingly biological text mining research is focusing on the extraction of complex relationships relevant to the construction and curation of biological networks and pathways. However, one important category of pathway - metabolic pathways - has been largely neglected.Here we present a relatively simple method for extracting metabolic reaction information from free text that scores different permutations of assigned entities (enzymes and metabolites) within a given sentence based on the presence and location of stemmed keywords. This method extends an approach that has proved effective in the context of the extraction of protein-protein interactions.

Results: When evaluated on a set of manually-curated metabolic pathways using standard performance criteria, our method performs surprisingly well. Precision and recall rates are comparable to those previously achieved for the well-known protein-protein interaction extraction task.

Conclusions: We conclude that automated metabolic pathway construction is more tractable than has often been assumed, and that (as in the case of protein-protein interaction extraction) relatively simple text-mining approaches can prove surprisingly effective. It is hoped that these results will provide an impetus to further research and act as a useful benchmark for judging the performance of more sophisticated methods that are yet to be developed.

Show MeSH

Related in: MedlinePlus

Graphs showing the performance of OSCAR3 at a range of confidence thresholds. Performance is shown under the following conditions: a) when applied to the SCAI chemical corpus; b) when applied to the GENIA corpus without acronym detection; and c) when applied to the GENIA corpus with acronym detection. The y-axis gives the recall(C), precision and F-score values in the range 0 to 1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3475109&req=5

Figure 2: Graphs showing the performance of OSCAR3 at a range of confidence thresholds. Performance is shown under the following conditions: a) when applied to the SCAI chemical corpus; b) when applied to the GENIA corpus without acronym detection; and c) when applied to the GENIA corpus with acronym detection. The y-axis gives the recall(C), precision and F-score values in the range 0 to 1.

Mentions: The results for OSCAR3 are more interesting and are presented in Figure2. Two features stand out from these results: the best performance of OSCAR3 on both corpora is worse than we had expected from results presented elsewhere[19], with peak F-scores of 62% and 48% on the Fraunhofer SCAI corpus (Figure2a) and the GENIA corpus (Figure2b) respectively; and the performance on the GENIA corpus is significantly worse than that on Fraunhofer SCAI.


A text-mining system for extracting metabolic reactions from full-text articles.

Czarnecki J, Nobeli I, Smith AM, Shepherd AJ - BMC Bioinformatics (2012)

Graphs showing the performance of OSCAR3 at a range of confidence thresholds. Performance is shown under the following conditions: a) when applied to the SCAI chemical corpus; b) when applied to the GENIA corpus without acronym detection; and c) when applied to the GENIA corpus with acronym detection. The y-axis gives the recall(C), precision and F-score values in the range 0 to 1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3475109&req=5

Figure 2: Graphs showing the performance of OSCAR3 at a range of confidence thresholds. Performance is shown under the following conditions: a) when applied to the SCAI chemical corpus; b) when applied to the GENIA corpus without acronym detection; and c) when applied to the GENIA corpus with acronym detection. The y-axis gives the recall(C), precision and F-score values in the range 0 to 1.
Mentions: The results for OSCAR3 are more interesting and are presented in Figure2. Two features stand out from these results: the best performance of OSCAR3 on both corpora is worse than we had expected from results presented elsewhere[19], with peak F-scores of 62% and 48% on the Fraunhofer SCAI corpus (Figure2a) and the GENIA corpus (Figure2b) respectively; and the performance on the GENIA corpus is significantly worse than that on Fraunhofer SCAI.

Bottom Line: Increasingly biological text mining research is focusing on the extraction of complex relationships relevant to the construction and curation of biological networks and pathways.Precision and recall rates are comparable to those previously achieved for the well-known protein-protein interaction extraction task.We conclude that automated metabolic pathway construction is more tractable than has often been assumed, and that (as in the case of protein-protein interaction extraction) relatively simple text-mining approaches can prove surprisingly effective.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biological Sciences and Institute of Molecular and Structural Biology, Birkbeck, University of London, London, UK.

ABSTRACT

Background: Increasingly biological text mining research is focusing on the extraction of complex relationships relevant to the construction and curation of biological networks and pathways. However, one important category of pathway - metabolic pathways - has been largely neglected.Here we present a relatively simple method for extracting metabolic reaction information from free text that scores different permutations of assigned entities (enzymes and metabolites) within a given sentence based on the presence and location of stemmed keywords. This method extends an approach that has proved effective in the context of the extraction of protein-protein interactions.

Results: When evaluated on a set of manually-curated metabolic pathways using standard performance criteria, our method performs surprisingly well. Precision and recall rates are comparable to those previously achieved for the well-known protein-protein interaction extraction task.

Conclusions: We conclude that automated metabolic pathway construction is more tractable than has often been assumed, and that (as in the case of protein-protein interaction extraction) relatively simple text-mining approaches can prove surprisingly effective. It is hoped that these results will provide an impetus to further research and act as a useful benchmark for judging the performance of more sophisticated methods that are yet to be developed.

Show MeSH
Related in: MedlinePlus