Limits...
Linking the Resource Description Framework to cheminformatics and proteochemometrics.

Willighagen EL, Alvarsson J, Andersson A, Eklund M, Lampa S, Lapins M, Spjuth O, Wikberg JE - J Biomed Semantics (2011)

Bottom Line: Ontologies and semantic markup have already been used for more than a decade in molecular sciences, but have not found widespread use yet.Being able to create and share workflows that integrate data aggregation and analysis (visual and statistical) is beneficial to interoperability and reproducibility.The current work shows that RDF approaches are sufficiently powerful to support molecular chemometrics workflows.

View Article: PubMed Central - HTML - PubMed

Affiliation: Uppsala University, Department of Pharmaceutical Biosciences, Box 591, SE-751 24 Uppsala, Sweden. egon.willighagen@farmbio.uu.se.

ABSTRACT

Background: Semantic web technologies are finding their way into the life sciences. Ontologies and semantic markup have already been used for more than a decade in molecular sciences, but have not found widespread use yet. The semantic web technology Resource Description Framework (RDF) and related methods show to be sufficiently versatile to change that situation.

Results: The work presented here focuses on linking RDF approaches to existing molecular chemometrics fields, including cheminformatics, QSAR modeling and proteochemometrics. Applications are presented that link RDF technologies to methods from statistics and cheminformatics, including data aggregation, visualization, chemical identification, and property prediction. They demonstrate how this can be done using various existing RDF standards and cheminformatics libraries. For example, we show how IC50 and Ki values are modeled for a number of biological targets using data from the ChEMBL database.

Conclusions: We have shown that existing RDF standards can suitably be integrated into existing molecular chemometrics methods. Platforms that unite these technologies, like Bioclipse, makes this even simpler and more transparent. Being able to create and share workflows that integrate data aggregation and analysis (visual and statistical) is beneficial to interoperability and reproducibility. The current work shows that RDF approaches are sufficiently powerful to support molecular chemometrics workflows.

No MeSH data available.


Related in: MedlinePlus

Notation3 serialization of the CDK data model for protonated methanol. Methanol is defined as two atoms, one bond in one molecule. A link out to http://rdf.openmolecules.net/ is made using the InChI. Available from additional file 6.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3105498&req=5

Figure 13: Notation3 serialization of the CDK data model for protonated methanol. Methanol is defined as two atoms, one bond in one molecule. A link out to http://rdf.openmolecules.net/ is made using the InChI. Available from additional file 6.

Mentions: The following example shows protonated methanol as RDF, serialized as Notation3 using the OWL-based CDK data model. It defines a molecule with two atoms, one of which is positively charged. Hydrogens are defined implicitly, as is commonly done in SMILES too. The bond links to the atoms, and has a defined bond order. The resources in the RDF representation match the Java Objects in the CDK library. Java objects are not identified by URIs, which is why the RDF uses example.com-based URIs in the example in Figure 13. Alternatively, anonymous resources can be used to reduce the number of URIs, though that puts hierarchical restrictions on how the data is serialized. The current source code that generates the RDF, allows us to use any arbitrary domain, and we anticipate that URIs for all Objects in the CDK will become available when the RDF representation becomes more popular. The Dublin Core namespace is reused for the name of the molecule, and an owl:sameAs predicate was used to link to the aforementioned http://rdf.openmolecules.net/ website. The OWL-based CDK data model ontology resembles the actual CDK data model. Compared to a basic chemical graph model, the CDK model has more complexity providing the flexibility needed to cover input from various chemical file formats.


Linking the Resource Description Framework to cheminformatics and proteochemometrics.

Willighagen EL, Alvarsson J, Andersson A, Eklund M, Lampa S, Lapins M, Spjuth O, Wikberg JE - J Biomed Semantics (2011)

Notation3 serialization of the CDK data model for protonated methanol. Methanol is defined as two atoms, one bond in one molecule. A link out to http://rdf.openmolecules.net/ is made using the InChI. Available from additional file 6.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3105498&req=5

Figure 13: Notation3 serialization of the CDK data model for protonated methanol. Methanol is defined as two atoms, one bond in one molecule. A link out to http://rdf.openmolecules.net/ is made using the InChI. Available from additional file 6.
Mentions: The following example shows protonated methanol as RDF, serialized as Notation3 using the OWL-based CDK data model. It defines a molecule with two atoms, one of which is positively charged. Hydrogens are defined implicitly, as is commonly done in SMILES too. The bond links to the atoms, and has a defined bond order. The resources in the RDF representation match the Java Objects in the CDK library. Java objects are not identified by URIs, which is why the RDF uses example.com-based URIs in the example in Figure 13. Alternatively, anonymous resources can be used to reduce the number of URIs, though that puts hierarchical restrictions on how the data is serialized. The current source code that generates the RDF, allows us to use any arbitrary domain, and we anticipate that URIs for all Objects in the CDK will become available when the RDF representation becomes more popular. The Dublin Core namespace is reused for the name of the molecule, and an owl:sameAs predicate was used to link to the aforementioned http://rdf.openmolecules.net/ website. The OWL-based CDK data model ontology resembles the actual CDK data model. Compared to a basic chemical graph model, the CDK model has more complexity providing the flexibility needed to cover input from various chemical file formats.

Bottom Line: Ontologies and semantic markup have already been used for more than a decade in molecular sciences, but have not found widespread use yet.Being able to create and share workflows that integrate data aggregation and analysis (visual and statistical) is beneficial to interoperability and reproducibility.The current work shows that RDF approaches are sufficiently powerful to support molecular chemometrics workflows.

View Article: PubMed Central - HTML - PubMed

Affiliation: Uppsala University, Department of Pharmaceutical Biosciences, Box 591, SE-751 24 Uppsala, Sweden. egon.willighagen@farmbio.uu.se.

ABSTRACT

Background: Semantic web technologies are finding their way into the life sciences. Ontologies and semantic markup have already been used for more than a decade in molecular sciences, but have not found widespread use yet. The semantic web technology Resource Description Framework (RDF) and related methods show to be sufficiently versatile to change that situation.

Results: The work presented here focuses on linking RDF approaches to existing molecular chemometrics fields, including cheminformatics, QSAR modeling and proteochemometrics. Applications are presented that link RDF technologies to methods from statistics and cheminformatics, including data aggregation, visualization, chemical identification, and property prediction. They demonstrate how this can be done using various existing RDF standards and cheminformatics libraries. For example, we show how IC50 and Ki values are modeled for a number of biological targets using data from the ChEMBL database.

Conclusions: We have shown that existing RDF standards can suitably be integrated into existing molecular chemometrics methods. Platforms that unite these technologies, like Bioclipse, makes this even simpler and more transparent. Being able to create and share workflows that integrate data aggregation and analysis (visual and statistical) is beneficial to interoperability and reproducibility. The current work shows that RDF approaches are sufficiently powerful to support molecular chemometrics workflows.

No MeSH data available.


Related in: MedlinePlus