Limits...
Representing annotation compositionality and provenance for the Semantic Web.

Livingston KM, Bada M, Hunter LE, Verspoor K - J Biomed Semantics (2013)

Bottom Line: Existing provenance efforts in the Semantic Web domain primarily focus on tracking provenance at the level of whole triples and do not provide enough detail to track how individual triple elements of annotations were derived from triple elements of other annotations.With this model, progressively more complex annotations can be composed from other annotations, and the provenance of compositional annotations can be represented at the annotation level or at the level of individual elements of the RDF triples composing the annotations.This in turn allows for progressively richer annotations to be constructed from previous annotation efforts, the precise provenance recording of which facilitates evidence-based inference and error tracking.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.

ABSTRACT

Background: Though the annotation of digital artifacts with metadata has a long history, the bulk of that work focuses on the association of single terms or concepts to single targets. As annotation efforts expand to capture more complex information, annotations will need to be able to refer to knowledge structures formally defined in terms of more atomic knowledge structures. Existing provenance efforts in the Semantic Web domain primarily focus on tracking provenance at the level of whole triples and do not provide enough detail to track how individual triple elements of annotations were derived from triple elements of other annotations.

Results: We present a task- and domain-independent ontological model for capturing annotations and their linkage to their denoted knowledge representations, which can be singular concepts or more complex sets of assertions. We have implemented this model as an extension of the Information Artifact Ontology in OWL and made it freely available, and we show how it can be integrated with several prominent annotation and provenance models. We present several application areas for the model, ranging from linguistic annotation of text to the annotation of disease-associations in genome sequences.

Conclusions: With this model, progressively more complex annotations can be composed from other annotations, and the provenance of compositional annotations can be represented at the annotation level or at the level of individual elements of the RDF triples composing the annotations. This in turn allows for progressively richer annotations to be constructed from previous annotation efforts, the precise provenance recording of which facilitates evidence-based inference and error tracking.

No MeSH data available.


Example syntactic annotations. This figure depicts five syntactic annotations as bold ovals with underlined labels: four RdfResourceAnnotation instances, each with the prefix “ra”, and one RdfGraphAnnotation instance, prefixed with “ga”. Rectangles represent classes, while instances have rounded corners. Double-lined arrows depict basedOn assertions. Thin gray arrows are used to provide reference to the text, although their representation is elided in this paper. The statements inside brackets are contained within the corresponding RDF graph.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4129183&req=5

Figure 1: Example syntactic annotations. This figure depicts five syntactic annotations as bold ovals with underlined labels: four RdfResourceAnnotation instances, each with the prefix “ra”, and one RdfGraphAnnotation instance, prefixed with “ga”. Rectangles represent classes, while instances have rounded corners. Double-lined arrows depict basedOn assertions. Thin gray arrows are used to provide reference to the text, although their representation is elided in this paper. The statements inside brackets are contained within the corresponding RDF graph.

Mentions: Common tasks at the beginning of text mining pipelines include tokenization and part-of-speech tagging [14]. Figure 1 depicts four resource annotations: ra1, ra2, ra3, and ra4. The concepts in the object positions of the denotes assertions are part of the domain model used by the annotator and are not part of the proposed annotation model itself. ra1 and ra2 denote specific instances of tokens (represented here as instances of the class Token), while ra3 and ra4 denote plural nouns and singular present-tense verbs, respectively (represented here by their Penn Treebank part-of-speech tags [15]). In this example, the annotator made the domain-specific representational choice to model the tokens as instances so that they can be specifically referred to later by subsequent annotations, as will be shown in the next section. Abstract relations connecting the resource annotations to text spans are shown in Figure 1 as gray arrows, with gray brackets representing the text spans. Existing models for linking annotations to the object being annotated can be used with our model, for example, the relations oa:hasTarget[3] or ao:context[4] could be used to model these gray arrows. As our model is neutral relative to these representational decisions, this aspect of modeling the example annotations is elided from this document for simplicity and clarity.


Representing annotation compositionality and provenance for the Semantic Web.

Livingston KM, Bada M, Hunter LE, Verspoor K - J Biomed Semantics (2013)

Example syntactic annotations. This figure depicts five syntactic annotations as bold ovals with underlined labels: four RdfResourceAnnotation instances, each with the prefix “ra”, and one RdfGraphAnnotation instance, prefixed with “ga”. Rectangles represent classes, while instances have rounded corners. Double-lined arrows depict basedOn assertions. Thin gray arrows are used to provide reference to the text, although their representation is elided in this paper. The statements inside brackets are contained within the corresponding RDF graph.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4129183&req=5

Figure 1: Example syntactic annotations. This figure depicts five syntactic annotations as bold ovals with underlined labels: four RdfResourceAnnotation instances, each with the prefix “ra”, and one RdfGraphAnnotation instance, prefixed with “ga”. Rectangles represent classes, while instances have rounded corners. Double-lined arrows depict basedOn assertions. Thin gray arrows are used to provide reference to the text, although their representation is elided in this paper. The statements inside brackets are contained within the corresponding RDF graph.
Mentions: Common tasks at the beginning of text mining pipelines include tokenization and part-of-speech tagging [14]. Figure 1 depicts four resource annotations: ra1, ra2, ra3, and ra4. The concepts in the object positions of the denotes assertions are part of the domain model used by the annotator and are not part of the proposed annotation model itself. ra1 and ra2 denote specific instances of tokens (represented here as instances of the class Token), while ra3 and ra4 denote plural nouns and singular present-tense verbs, respectively (represented here by their Penn Treebank part-of-speech tags [15]). In this example, the annotator made the domain-specific representational choice to model the tokens as instances so that they can be specifically referred to later by subsequent annotations, as will be shown in the next section. Abstract relations connecting the resource annotations to text spans are shown in Figure 1 as gray arrows, with gray brackets representing the text spans. Existing models for linking annotations to the object being annotated can be used with our model, for example, the relations oa:hasTarget[3] or ao:context[4] could be used to model these gray arrows. As our model is neutral relative to these representational decisions, this aspect of modeling the example annotations is elided from this document for simplicity and clarity.

Bottom Line: Existing provenance efforts in the Semantic Web domain primarily focus on tracking provenance at the level of whole triples and do not provide enough detail to track how individual triple elements of annotations were derived from triple elements of other annotations.With this model, progressively more complex annotations can be composed from other annotations, and the provenance of compositional annotations can be represented at the annotation level or at the level of individual elements of the RDF triples composing the annotations.This in turn allows for progressively richer annotations to be constructed from previous annotation efforts, the precise provenance recording of which facilitates evidence-based inference and error tracking.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.

ABSTRACT

Background: Though the annotation of digital artifacts with metadata has a long history, the bulk of that work focuses on the association of single terms or concepts to single targets. As annotation efforts expand to capture more complex information, annotations will need to be able to refer to knowledge structures formally defined in terms of more atomic knowledge structures. Existing provenance efforts in the Semantic Web domain primarily focus on tracking provenance at the level of whole triples and do not provide enough detail to track how individual triple elements of annotations were derived from triple elements of other annotations.

Results: We present a task- and domain-independent ontological model for capturing annotations and their linkage to their denoted knowledge representations, which can be singular concepts or more complex sets of assertions. We have implemented this model as an extension of the Information Artifact Ontology in OWL and made it freely available, and we show how it can be integrated with several prominent annotation and provenance models. We present several application areas for the model, ranging from linguistic annotation of text to the annotation of disease-associations in genome sequences.

Conclusions: With this model, progressively more complex annotations can be composed from other annotations, and the provenance of compositional annotations can be represented at the annotation level or at the level of individual elements of the RDF triples composing the annotations. This in turn allows for progressively richer annotations to be constructed from previous annotation efforts, the precise provenance recording of which facilitates evidence-based inference and error tracking.

No MeSH data available.