Limits...
A text-mining system for extracting metabolic reactions from full-text articles.

Czarnecki J, Nobeli I, Smith AM, Shepherd AJ - BMC Bioinformatics (2012)

Bottom Line: Increasingly biological text mining research is focusing on the extraction of complex relationships relevant to the construction and curation of biological networks and pathways.Precision and recall rates are comparable to those previously achieved for the well-known protein-protein interaction extraction task.We conclude that automated metabolic pathway construction is more tractable than has often been assumed, and that (as in the case of protein-protein interaction extraction) relatively simple text-mining approaches can prove surprisingly effective.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biological Sciences and Institute of Molecular and Structural Biology, Birkbeck, University of London, London, UK.

ABSTRACT

Background: Increasingly biological text mining research is focusing on the extraction of complex relationships relevant to the construction and curation of biological networks and pathways. However, one important category of pathway - metabolic pathways - has been largely neglected.Here we present a relatively simple method for extracting metabolic reaction information from free text that scores different permutations of assigned entities (enzymes and metabolites) within a given sentence based on the presence and location of stemmed keywords. This method extends an approach that has proved effective in the context of the extraction of protein-protein interactions.

Results: When evaluated on a set of manually-curated metabolic pathways using standard performance criteria, our method performs surprisingly well. Precision and recall rates are comparable to those previously achieved for the well-known protein-protein interaction extraction task.

Conclusions: We conclude that automated metabolic pathway construction is more tractable than has often been assumed, and that (as in the case of protein-protein interaction extraction) relatively simple text-mining approaches can prove surprisingly effective. It is hoped that these results will provide an impetus to further research and act as a useful benchmark for judging the performance of more sophisticated methods that are yet to be developed.

Show MeSH
A network showing the reactions predicted from the eight source papers for the pantothenate and coenzyme A biosynthesis pathway. Squares are small molecules, circles are enzymes, and a pair of arrows is used to denote a single reaction (the first for the interaction substrate-enzyme, and the second for the interaction enzyme-product). Items labeled green are correct; items labeled red are incorrect. The number next to a reaction indicates the number of times that reaction was extracted from the set of source texts. The reactions on the right-hand side of the figure (lying outside the blue rectangle) are reactions extracted by our algorithm that are not part of the manually-annotated pantothenate and coenzyme A biosynthesis pathway from EcoCyc given in Figure1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3475109&req=5

Figure 3: A network showing the reactions predicted from the eight source papers for the pantothenate and coenzyme A biosynthesis pathway. Squares are small molecules, circles are enzymes, and a pair of arrows is used to denote a single reaction (the first for the interaction substrate-enzyme, and the second for the interaction enzyme-product). Items labeled green are correct; items labeled red are incorrect. The number next to a reaction indicates the number of times that reaction was extracted from the set of source texts. The reactions on the right-hand side of the figure (lying outside the blue rectangle) are reactions extracted by our algorithm that are not part of the manually-annotated pantothenate and coenzyme A biosynthesis pathway from EcoCyc given in Figure1.

Mentions: Our metabolic reaction extraction results (with and without the correct assignment of enzymes taken into account) for all three evaluation pathways, are shown in Table3. The same results broken down into binary interactions (substrate–product, substrate–enzyme and product–enzyme), along with the results for the Reactome dataset, are shown in Table4. Note that the number of binary pairs is larger than the number of reactions, because some reactions comprise multiple substrates and/or products. A visual summary of the complete set of results for the smallest of the three pathways (the pantothenate and coenzyme A biosynthesis pathway) is given in Figure3. Equivalent figures for the tetrahydrofolate biosynthesis and the aerobic fatty acid β-oxidation I pathways are given in Additional file2, together with a set of example sentences annotated with the putative entities and relationships extracted by our system.


A text-mining system for extracting metabolic reactions from full-text articles.

Czarnecki J, Nobeli I, Smith AM, Shepherd AJ - BMC Bioinformatics (2012)

A network showing the reactions predicted from the eight source papers for the pantothenate and coenzyme A biosynthesis pathway. Squares are small molecules, circles are enzymes, and a pair of arrows is used to denote a single reaction (the first for the interaction substrate-enzyme, and the second for the interaction enzyme-product). Items labeled green are correct; items labeled red are incorrect. The number next to a reaction indicates the number of times that reaction was extracted from the set of source texts. The reactions on the right-hand side of the figure (lying outside the blue rectangle) are reactions extracted by our algorithm that are not part of the manually-annotated pantothenate and coenzyme A biosynthesis pathway from EcoCyc given in Figure1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3475109&req=5

Figure 3: A network showing the reactions predicted from the eight source papers for the pantothenate and coenzyme A biosynthesis pathway. Squares are small molecules, circles are enzymes, and a pair of arrows is used to denote a single reaction (the first for the interaction substrate-enzyme, and the second for the interaction enzyme-product). Items labeled green are correct; items labeled red are incorrect. The number next to a reaction indicates the number of times that reaction was extracted from the set of source texts. The reactions on the right-hand side of the figure (lying outside the blue rectangle) are reactions extracted by our algorithm that are not part of the manually-annotated pantothenate and coenzyme A biosynthesis pathway from EcoCyc given in Figure1.
Mentions: Our metabolic reaction extraction results (with and without the correct assignment of enzymes taken into account) for all three evaluation pathways, are shown in Table3. The same results broken down into binary interactions (substrate–product, substrate–enzyme and product–enzyme), along with the results for the Reactome dataset, are shown in Table4. Note that the number of binary pairs is larger than the number of reactions, because some reactions comprise multiple substrates and/or products. A visual summary of the complete set of results for the smallest of the three pathways (the pantothenate and coenzyme A biosynthesis pathway) is given in Figure3. Equivalent figures for the tetrahydrofolate biosynthesis and the aerobic fatty acid β-oxidation I pathways are given in Additional file2, together with a set of example sentences annotated with the putative entities and relationships extracted by our system.

Bottom Line: Increasingly biological text mining research is focusing on the extraction of complex relationships relevant to the construction and curation of biological networks and pathways.Precision and recall rates are comparable to those previously achieved for the well-known protein-protein interaction extraction task.We conclude that automated metabolic pathway construction is more tractable than has often been assumed, and that (as in the case of protein-protein interaction extraction) relatively simple text-mining approaches can prove surprisingly effective.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biological Sciences and Institute of Molecular and Structural Biology, Birkbeck, University of London, London, UK.

ABSTRACT

Background: Increasingly biological text mining research is focusing on the extraction of complex relationships relevant to the construction and curation of biological networks and pathways. However, one important category of pathway - metabolic pathways - has been largely neglected.Here we present a relatively simple method for extracting metabolic reaction information from free text that scores different permutations of assigned entities (enzymes and metabolites) within a given sentence based on the presence and location of stemmed keywords. This method extends an approach that has proved effective in the context of the extraction of protein-protein interactions.

Results: When evaluated on a set of manually-curated metabolic pathways using standard performance criteria, our method performs surprisingly well. Precision and recall rates are comparable to those previously achieved for the well-known protein-protein interaction extraction task.

Conclusions: We conclude that automated metabolic pathway construction is more tractable than has often been assumed, and that (as in the case of protein-protein interaction extraction) relatively simple text-mining approaches can prove surprisingly effective. It is hoped that these results will provide an impetus to further research and act as a useful benchmark for judging the performance of more sophisticated methods that are yet to be developed.

Show MeSH