Limits...
Semantics-based composition of EMBOSS services.

Lamprecht AL, Naujokat S, Margaria T, Steffen B - J Biomed Semantics (2011)

Bottom Line: Our experiments demonstrate that these domain models in combination with our synthesis methodology greatly simplify working with the large, heterogeneous, and hence manually intractable EMBOSS collection.However, they also show that with the information that can be derived from the (current) ACD files and EDAM ontology alone, some essential connections between services can not be recognized.Our results show that adequate domain modeling requires to incorporate as much domain knowledge as possible, far beyond the mere technical aspects of the different types and services.

View Article: PubMed Central - HTML - PubMed

Affiliation: Chair for Programming Systems, Technical University Dortmund, Dortmund, D-44227, Germany. anna-lena.lamprecht@cs.tu-dortmund.de.

ABSTRACT

Background: More than in other domains the heterogeneous services world in bioinformatics demands for a methodology to classify and relate resources in a both human and machine accessible manner. The Semantic Web, which is meant to address exactly this challenge, is currently one of the most ambitious projects in computer science. Collective efforts within the community have already led to a basis of standards for semantic service descriptions and meta-information. In combination with process synthesis and planning methods, such knowledge about types and services can facilitate the automatic composition of workflows for particular research questions.

Results: In this study we apply the synthesis methodology that is available in the Bio-jETI workflow management framework for the semantics-based composition of EMBOSS services. EMBOSS (European Molecular Biology Open Software Suite) is a collection of 350 tools (March 2010) for various sequence analysis tasks, and thus a rich source of services and types that imply comprehensive domain models for planning and synthesis approaches. We use and compare two different setups of our EMBOSS synthesis domain: 1) a manually defined domain setup where an intuitive, high-level, semantically meaningful nomenclature is applied to describe the input/output behavior of the single EMBOSS tools and their classifications, and 2) a domain setup where this information has been automatically derived from the EMBOSS Ajax Command Definition (ACD) files and the EMBRACE Data and Methods ontology (EDAM). Our experiments demonstrate that these domain models in combination with our synthesis methodology greatly simplify working with the large, heterogeneous, and hence manually intractable EMBOSS collection. However, they also show that with the information that can be derived from the (current) ACD files and EDAM ontology alone, some essential connections between services can not be recognized.

Conclusions: Our results show that adequate domain modeling requires to incorporate as much domain knowledge as possible, far beyond the mere technical aspects of the different types and services. Finding or defining semantically appropriate service and type descriptions is a difficult task, but the bioinformatics community appears to be on the right track towards a Life Science Semantic Web, which will eventually allow automatic service composition methods to unfold their full potential.

No MeSH data available.


Related in: MedlinePlus

Synthesis example 3 In this workflow, which does not (yet) contain any EMBOSS services (left), a part of the loop body is a loosely specified and has to be concretized by an appropriate sequence of services. The workflow in the center contains one of the service sequences that were proposed by the synthesis algorithm for the manually created domain and constraints expressing that we want to ”Enforce the use of module Protein2dStructure” and ”Use									Display								 as last service in solution”. The constraints ”Use								 showreport								 as last service in solution” and ”Enforce the use of module									 protein secondary structure prediction								” used together with the automatically created domain leads to the results on the right side of the figure.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3105497&req=5

Figure 8: Synthesis example 3 In this workflow, which does not (yet) contain any EMBOSS services (left), a part of the loop body is a loosely specified and has to be concretized by an appropriate sequence of services. The workflow in the center contains one of the service sequences that were proposed by the synthesis algorithm for the manually created domain and constraints expressing that we want to ”Enforce the use of module Protein2dStructure” and ”Use Display as last service in solution”. The constraints ”Use showreport as last service in solution” and ”Enforce the use of module protein secondary structure prediction ” used together with the automatically created domain leads to the results on the right side of the figure.

Mentions: As a third and final example in this paper, we discuss the process shown in Figure 8 (left), which does not (yet) contain any EMBOSS services. A (nucleotide) sequence is fetched from the DNA Data Bank of Japan (DDBJ), and used for a BLAST search against a protein database. The Uniprot IDs are extracted from the BLAST result and then processed in a loop that fetches the Uniprot entry for this ID. The remainder of the loop body is a loosely specified branch, to be concretized by an appropriate sequence of services. The synthesis plugin has access to both the EMBOSS and the DDBJ domain model and can transparently combine services from both sources. For this example, we did not only use the HMMER subset but the complete EMBOSS domains to find an appropriate sequence of services that does something with the protein sequence that is retrieved within the loop.


Semantics-based composition of EMBOSS services.

Lamprecht AL, Naujokat S, Margaria T, Steffen B - J Biomed Semantics (2011)

Synthesis example 3 In this workflow, which does not (yet) contain any EMBOSS services (left), a part of the loop body is a loosely specified and has to be concretized by an appropriate sequence of services. The workflow in the center contains one of the service sequences that were proposed by the synthesis algorithm for the manually created domain and constraints expressing that we want to ”Enforce the use of module Protein2dStructure” and ”Use									Display								 as last service in solution”. The constraints ”Use								 showreport								 as last service in solution” and ”Enforce the use of module									 protein secondary structure prediction								” used together with the automatically created domain leads to the results on the right side of the figure.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3105497&req=5

Figure 8: Synthesis example 3 In this workflow, which does not (yet) contain any EMBOSS services (left), a part of the loop body is a loosely specified and has to be concretized by an appropriate sequence of services. The workflow in the center contains one of the service sequences that were proposed by the synthesis algorithm for the manually created domain and constraints expressing that we want to ”Enforce the use of module Protein2dStructure” and ”Use Display as last service in solution”. The constraints ”Use showreport as last service in solution” and ”Enforce the use of module protein secondary structure prediction ” used together with the automatically created domain leads to the results on the right side of the figure.
Mentions: As a third and final example in this paper, we discuss the process shown in Figure 8 (left), which does not (yet) contain any EMBOSS services. A (nucleotide) sequence is fetched from the DNA Data Bank of Japan (DDBJ), and used for a BLAST search against a protein database. The Uniprot IDs are extracted from the BLAST result and then processed in a loop that fetches the Uniprot entry for this ID. The remainder of the loop body is a loosely specified branch, to be concretized by an appropriate sequence of services. The synthesis plugin has access to both the EMBOSS and the DDBJ domain model and can transparently combine services from both sources. For this example, we did not only use the HMMER subset but the complete EMBOSS domains to find an appropriate sequence of services that does something with the protein sequence that is retrieved within the loop.

Bottom Line: Our experiments demonstrate that these domain models in combination with our synthesis methodology greatly simplify working with the large, heterogeneous, and hence manually intractable EMBOSS collection.However, they also show that with the information that can be derived from the (current) ACD files and EDAM ontology alone, some essential connections between services can not be recognized.Our results show that adequate domain modeling requires to incorporate as much domain knowledge as possible, far beyond the mere technical aspects of the different types and services.

View Article: PubMed Central - HTML - PubMed

Affiliation: Chair for Programming Systems, Technical University Dortmund, Dortmund, D-44227, Germany. anna-lena.lamprecht@cs.tu-dortmund.de.

ABSTRACT

Background: More than in other domains the heterogeneous services world in bioinformatics demands for a methodology to classify and relate resources in a both human and machine accessible manner. The Semantic Web, which is meant to address exactly this challenge, is currently one of the most ambitious projects in computer science. Collective efforts within the community have already led to a basis of standards for semantic service descriptions and meta-information. In combination with process synthesis and planning methods, such knowledge about types and services can facilitate the automatic composition of workflows for particular research questions.

Results: In this study we apply the synthesis methodology that is available in the Bio-jETI workflow management framework for the semantics-based composition of EMBOSS services. EMBOSS (European Molecular Biology Open Software Suite) is a collection of 350 tools (March 2010) for various sequence analysis tasks, and thus a rich source of services and types that imply comprehensive domain models for planning and synthesis approaches. We use and compare two different setups of our EMBOSS synthesis domain: 1) a manually defined domain setup where an intuitive, high-level, semantically meaningful nomenclature is applied to describe the input/output behavior of the single EMBOSS tools and their classifications, and 2) a domain setup where this information has been automatically derived from the EMBOSS Ajax Command Definition (ACD) files and the EMBRACE Data and Methods ontology (EDAM). Our experiments demonstrate that these domain models in combination with our synthesis methodology greatly simplify working with the large, heterogeneous, and hence manually intractable EMBOSS collection. However, they also show that with the information that can be derived from the (current) ACD files and EDAM ontology alone, some essential connections between services can not be recognized.

Conclusions: Our results show that adequate domain modeling requires to incorporate as much domain knowledge as possible, far beyond the mere technical aspects of the different types and services. Finding or defining semantically appropriate service and type descriptions is a difficult task, but the bioinformatics community appears to be on the right track towards a Life Science Semantic Web, which will eventually allow automatic service composition methods to unfold their full potential.

No MeSH data available.


Related in: MedlinePlus