Limits...
Representing annotation compositionality and provenance for the Semantic Web.

Livingston KM, Bada M, Hunter LE, Verspoor K - J Biomed Semantics (2013)

Bottom Line: Existing provenance efforts in the Semantic Web domain primarily focus on tracking provenance at the level of whole triples and do not provide enough detail to track how individual triple elements of annotations were derived from triple elements of other annotations.With this model, progressively more complex annotations can be composed from other annotations, and the provenance of compositional annotations can be represented at the annotation level or at the level of individual elements of the RDF triples composing the annotations.This in turn allows for progressively richer annotations to be constructed from previous annotation efforts, the precise provenance recording of which facilitates evidence-based inference and error tracking.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.

ABSTRACT

Background: Though the annotation of digital artifacts with metadata has a long history, the bulk of that work focuses on the association of single terms or concepts to single targets. As annotation efforts expand to capture more complex information, annotations will need to be able to refer to knowledge structures formally defined in terms of more atomic knowledge structures. Existing provenance efforts in the Semantic Web domain primarily focus on tracking provenance at the level of whole triples and do not provide enough detail to track how individual triple elements of annotations were derived from triple elements of other annotations.

Results: We present a task- and domain-independent ontological model for capturing annotations and their linkage to their denoted knowledge representations, which can be singular concepts or more complex sets of assertions. We have implemented this model as an extension of the Information Artifact Ontology in OWL and made it freely available, and we show how it can be integrated with several prominent annotation and provenance models. We present several application areas for the model, ranging from linguistic annotation of text to the annotation of disease-associations in genome sequences.

Conclusions: With this model, progressively more complex annotations can be composed from other annotations, and the provenance of compositional annotations can be represented at the annotation level or at the level of individual elements of the RDF triples composing the annotations. This in turn allows for progressively richer annotations to be constructed from previous annotation efforts, the precise provenance recording of which facilitates evidence-based inference and error tracking.

No MeSH data available.


Example biomedical semantic annotations. This figure depicts five semantic annotations as bold ovals with underlined labels: three RdfResourceAnnotation instances and two RdfGraphAnnotation instances. (See the caption of Figure 1 for explanation of shapes and arrows).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4129183&req=5

Figure 2: Example biomedical semantic annotations. This figure depicts five semantic annotations as bold ovals with underlined labels: three RdfResourceAnnotation instances and two RdfGraphAnnotation instances. (See the caption of Figure 1 for explanation of shapes and arrows).

Mentions: Semantic annotations of text fragments, such as those in the CRAFT Corpus [16], are another primary use case for the model presented here. In Figure 2, the example sentence fragment from Figure 1 has been annotated with semantic classes in the manner of CRAFT annotation. (In some cases, we have used ontologies and classes not used in CRAFT in order to simplify the biology and therefore the example.) The biomedical classes and properties used to model the examples in this paper are not part of the proposed annotation model. In Figure 2, the three example resource annotations ra5, ra6, and ra7 denote relevant biological concepts: ra5 denotes interferons, a group of proteins represented here by Interferon (IPR000471) in the InterPro database of protein sequence signatures and families [17]; ra6 denotes the upregulation of biological processes, represented here by positive regulation of biological process (GO:0048518)a in the Gene Ontology [18]; and ra7 denotes STAT6 proteins, represented here by STAT6 (PR:000001933) in the Protein Ontology [19]. The following are RDF triples for two of these annotations, specifically asserting that ra6 and ra7 are resource annotations that denote positive regulation of biological processes (represented here as GO:0048518) and STAT6 proteins (represented here as PR:000001933), respectively.


Representing annotation compositionality and provenance for the Semantic Web.

Livingston KM, Bada M, Hunter LE, Verspoor K - J Biomed Semantics (2013)

Example biomedical semantic annotations. This figure depicts five semantic annotations as bold ovals with underlined labels: three RdfResourceAnnotation instances and two RdfGraphAnnotation instances. (See the caption of Figure 1 for explanation of shapes and arrows).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4129183&req=5

Figure 2: Example biomedical semantic annotations. This figure depicts five semantic annotations as bold ovals with underlined labels: three RdfResourceAnnotation instances and two RdfGraphAnnotation instances. (See the caption of Figure 1 for explanation of shapes and arrows).
Mentions: Semantic annotations of text fragments, such as those in the CRAFT Corpus [16], are another primary use case for the model presented here. In Figure 2, the example sentence fragment from Figure 1 has been annotated with semantic classes in the manner of CRAFT annotation. (In some cases, we have used ontologies and classes not used in CRAFT in order to simplify the biology and therefore the example.) The biomedical classes and properties used to model the examples in this paper are not part of the proposed annotation model. In Figure 2, the three example resource annotations ra5, ra6, and ra7 denote relevant biological concepts: ra5 denotes interferons, a group of proteins represented here by Interferon (IPR000471) in the InterPro database of protein sequence signatures and families [17]; ra6 denotes the upregulation of biological processes, represented here by positive regulation of biological process (GO:0048518)a in the Gene Ontology [18]; and ra7 denotes STAT6 proteins, represented here by STAT6 (PR:000001933) in the Protein Ontology [19]. The following are RDF triples for two of these annotations, specifically asserting that ra6 and ra7 are resource annotations that denote positive regulation of biological processes (represented here as GO:0048518) and STAT6 proteins (represented here as PR:000001933), respectively.

Bottom Line: Existing provenance efforts in the Semantic Web domain primarily focus on tracking provenance at the level of whole triples and do not provide enough detail to track how individual triple elements of annotations were derived from triple elements of other annotations.With this model, progressively more complex annotations can be composed from other annotations, and the provenance of compositional annotations can be represented at the annotation level or at the level of individual elements of the RDF triples composing the annotations.This in turn allows for progressively richer annotations to be constructed from previous annotation efforts, the precise provenance recording of which facilitates evidence-based inference and error tracking.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.

ABSTRACT

Background: Though the annotation of digital artifacts with metadata has a long history, the bulk of that work focuses on the association of single terms or concepts to single targets. As annotation efforts expand to capture more complex information, annotations will need to be able to refer to knowledge structures formally defined in terms of more atomic knowledge structures. Existing provenance efforts in the Semantic Web domain primarily focus on tracking provenance at the level of whole triples and do not provide enough detail to track how individual triple elements of annotations were derived from triple elements of other annotations.

Results: We present a task- and domain-independent ontological model for capturing annotations and their linkage to their denoted knowledge representations, which can be singular concepts or more complex sets of assertions. We have implemented this model as an extension of the Information Artifact Ontology in OWL and made it freely available, and we show how it can be integrated with several prominent annotation and provenance models. We present several application areas for the model, ranging from linguistic annotation of text to the annotation of disease-associations in genome sequences.

Conclusions: With this model, progressively more complex annotations can be composed from other annotations, and the provenance of compositional annotations can be represented at the annotation level or at the level of individual elements of the RDF triples composing the annotations. This in turn allows for progressively richer annotations to be constructed from previous annotation efforts, the precise provenance recording of which facilitates evidence-based inference and error tracking.

No MeSH data available.