Limits...
KneeTex: an ontology-driven system for information extraction from MRI reports.

Spasić I, Zhao B, Jones CB, Button K - J Biomed Semantics (2015)

Bottom Line: Therefore, clinical narratives found in MRI reports convey valuable diagnostic information.Information extraction results were evaluated on a test set of 100 MRI reports.As a result, formally structured and coded information allows for complex searches to be conducted efficiently over the original MRI reports, thereby effectively supporting epidemiologic studies of knee conditions.

View Article: PubMed Central - PubMed

Affiliation: School of Computer Science & Informatics, Cardiff University, Cardiff, CF24 3AA UK.

ABSTRACT

Background: In the realm of knee pathology, magnetic resonance imaging (MRI) has the advantage of visualising all structures within the knee joint, which makes it a valuable tool for increasing diagnostic accuracy and planning surgical treatments. Therefore, clinical narratives found in MRI reports convey valuable diagnostic information. A range of studies have proven the feasibility of natural language processing for information extraction from clinical narratives. However, no study focused specifically on MRI reports in relation to knee pathology, possibly due to the complexity of knee anatomy and a wide range of conditions that may be associated with different anatomical entities. In this paper we describe KneeTex, an information extraction system that operates in this domain.

Methods: As an ontology-driven information extraction system, KneeTex makes active use of an ontology to strongly guide and constrain text analysis. We used automatic term recognition to facilitate the development of a domain-specific ontology with sufficient detail and coverage for text mining applications. In combination with the ontology, high regularity of the sublanguage used in knee MRI reports allowed us to model its processing by a set of sophisticated lexico-semantic rules with minimal syntactic analysis. The main processing steps involve named entity recognition combined with coordination, enumeration, ambiguity and co-reference resolution, followed by text segmentation. Ontology-based semantic typing is then used to drive the template filling process.

Results: We adopted an existing ontology, TRAK (Taxonomy for RehAbilitation of Knee conditions), for use within KneeTex. The original TRAK ontology expanded from 1,292 concepts, 1,720 synonyms and 518 relationship instances to 1,621 concepts, 2,550 synonyms and 560 relationship instances. This provided KneeTex with a very fine-grained lexico-semantic knowledge base, which is highly attuned to the given sublanguage. Information extraction results were evaluated on a test set of 100 MRI reports. A gold standard consisted of 1,259 filled template records with the following slots: finding, finding qualifier, negation, certainty, anatomy and anatomy qualifier. KneeTex extracted information with precision of 98.00 %, recall of 97.63 % and F-measure of 97.81 %, the values of which are in line with human-like performance.

Conclusions: KneeTex is an open-source, stand-alone application for information extraction from narrative reports that describe an MRI scan of the knee. Given an MRI report as input, the system outputs the corresponding clinical findings in the form of JavaScript Object Notation objects. The extracted information is mapped onto TRAK, an ontology that formally models knowledge relevant for the rehabilitation of knee conditions. As a result, formally structured and coded information allows for complex searches to be conducted efficiently over the original MRI reports, thereby effectively supporting epidemiologic studies of knee conditions.

No MeSH data available.


Related in: MedlinePlus

Distribution of UMLS concept mentions. MetaMap was used to automatically identify concept mentions in the training set
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4561435&req=5

Fig4: Distribution of UMLS concept mentions. MetaMap was used to automatically identify concept mentions in the training set

Mentions: The vast majority of existing TRAK concepts (more precisely, 875 out of 1,292) were originally cross–referenced to the UMLS, a terminological resource that integrates over 150 biomedical vocabularies [38], in an attempt to standardise TRAK terminology and facilitate its integration with other terminological sources. During the initial development of TRAK, the UMLS was searched collaboratively by a physiotherapist (who was both practitioner and researcher) and an informatician to obtain concept identifiers, synonyms and definitions, where such information was available. Given the availability of MRI reports, we were now able to automate the process of finding other relevant concepts in the UMLS. For this purpose, we used MetaMap, a software tool for recognising UMLS concepts in biomedical text [39]. We applied MetaMap against a training corpus of 1,368 MRI reports to recognise UMLS concepts and obtain their unique concept identifier and a preferred name in the UMLS. Given that the majority of TRAK concepts (approximately 68 %) were already cross–referenced to the UMLS, we used these identifiers to automatically remove known UMLS concepts from unnecessary consideration. The remaining MetaMap output formed a list of 1,121UMLS concepts to be considered for inclusion in TRAK. To facilitate the manual curation process, the list was ordered by the frequency of occurrence of each concept within the training dataset. The frequency graph shown in Fig. 4 depicts a power law distribution [40] of UMLS concept mentions. Using the Pareto principle (or 80:20 rule) as a guideline [41], we focused on approximately 20 % of most frequently mentioned concepts by considering only those that occurred at least 100 times in the training dataset. A total of 215 frequently mentioned UMLS concepts were manually curated and considered for inclusion in TRAK. Some examples of highest ranked relevant concepts include intact, rupture, laceration, etc.Fig. 4


KneeTex: an ontology-driven system for information extraction from MRI reports.

Spasić I, Zhao B, Jones CB, Button K - J Biomed Semantics (2015)

Distribution of UMLS concept mentions. MetaMap was used to automatically identify concept mentions in the training set
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4561435&req=5

Fig4: Distribution of UMLS concept mentions. MetaMap was used to automatically identify concept mentions in the training set
Mentions: The vast majority of existing TRAK concepts (more precisely, 875 out of 1,292) were originally cross–referenced to the UMLS, a terminological resource that integrates over 150 biomedical vocabularies [38], in an attempt to standardise TRAK terminology and facilitate its integration with other terminological sources. During the initial development of TRAK, the UMLS was searched collaboratively by a physiotherapist (who was both practitioner and researcher) and an informatician to obtain concept identifiers, synonyms and definitions, where such information was available. Given the availability of MRI reports, we were now able to automate the process of finding other relevant concepts in the UMLS. For this purpose, we used MetaMap, a software tool for recognising UMLS concepts in biomedical text [39]. We applied MetaMap against a training corpus of 1,368 MRI reports to recognise UMLS concepts and obtain their unique concept identifier and a preferred name in the UMLS. Given that the majority of TRAK concepts (approximately 68 %) were already cross–referenced to the UMLS, we used these identifiers to automatically remove known UMLS concepts from unnecessary consideration. The remaining MetaMap output formed a list of 1,121UMLS concepts to be considered for inclusion in TRAK. To facilitate the manual curation process, the list was ordered by the frequency of occurrence of each concept within the training dataset. The frequency graph shown in Fig. 4 depicts a power law distribution [40] of UMLS concept mentions. Using the Pareto principle (or 80:20 rule) as a guideline [41], we focused on approximately 20 % of most frequently mentioned concepts by considering only those that occurred at least 100 times in the training dataset. A total of 215 frequently mentioned UMLS concepts were manually curated and considered for inclusion in TRAK. Some examples of highest ranked relevant concepts include intact, rupture, laceration, etc.Fig. 4

Bottom Line: Therefore, clinical narratives found in MRI reports convey valuable diagnostic information.Information extraction results were evaluated on a test set of 100 MRI reports.As a result, formally structured and coded information allows for complex searches to be conducted efficiently over the original MRI reports, thereby effectively supporting epidemiologic studies of knee conditions.

View Article: PubMed Central - PubMed

Affiliation: School of Computer Science & Informatics, Cardiff University, Cardiff, CF24 3AA UK.

ABSTRACT

Background: In the realm of knee pathology, magnetic resonance imaging (MRI) has the advantage of visualising all structures within the knee joint, which makes it a valuable tool for increasing diagnostic accuracy and planning surgical treatments. Therefore, clinical narratives found in MRI reports convey valuable diagnostic information. A range of studies have proven the feasibility of natural language processing for information extraction from clinical narratives. However, no study focused specifically on MRI reports in relation to knee pathology, possibly due to the complexity of knee anatomy and a wide range of conditions that may be associated with different anatomical entities. In this paper we describe KneeTex, an information extraction system that operates in this domain.

Methods: As an ontology-driven information extraction system, KneeTex makes active use of an ontology to strongly guide and constrain text analysis. We used automatic term recognition to facilitate the development of a domain-specific ontology with sufficient detail and coverage for text mining applications. In combination with the ontology, high regularity of the sublanguage used in knee MRI reports allowed us to model its processing by a set of sophisticated lexico-semantic rules with minimal syntactic analysis. The main processing steps involve named entity recognition combined with coordination, enumeration, ambiguity and co-reference resolution, followed by text segmentation. Ontology-based semantic typing is then used to drive the template filling process.

Results: We adopted an existing ontology, TRAK (Taxonomy for RehAbilitation of Knee conditions), for use within KneeTex. The original TRAK ontology expanded from 1,292 concepts, 1,720 synonyms and 518 relationship instances to 1,621 concepts, 2,550 synonyms and 560 relationship instances. This provided KneeTex with a very fine-grained lexico-semantic knowledge base, which is highly attuned to the given sublanguage. Information extraction results were evaluated on a test set of 100 MRI reports. A gold standard consisted of 1,259 filled template records with the following slots: finding, finding qualifier, negation, certainty, anatomy and anatomy qualifier. KneeTex extracted information with precision of 98.00 %, recall of 97.63 % and F-measure of 97.81 %, the values of which are in line with human-like performance.

Conclusions: KneeTex is an open-source, stand-alone application for information extraction from narrative reports that describe an MRI scan of the knee. Given an MRI report as input, the system outputs the corresponding clinical findings in the form of JavaScript Object Notation objects. The extracted information is mapped onto TRAK, an ontology that formally models knowledge relevant for the rehabilitation of knee conditions. As a result, formally structured and coded information allows for complex searches to be conducted efficiently over the original MRI reports, thereby effectively supporting epidemiologic studies of knee conditions.

No MeSH data available.


Related in: MedlinePlus