Limits...
Open Source Bayesian Models. 1. Application to ADME/Tox and Drug Discovery Datasets.

Clark AM, Dole K, Coulon-Spektor A, McNutt A, Grass G, Freundlich JS, Reynolds RC, Ekins S - J Chem Inf Model (2015)

Bottom Line: We have now described how the implementation of Bayesian models with FCFP6 descriptors generated in the CDD Vault enables the rapid production of robust machine learning models from public data or the user's own datasets.The current study sets the stage for generating models in proprietary software (such as CDD) and exporting these models in a format that could be run in open source software using CDK components.This work also demonstrates that we can enable biocomputation across distributed private or public datasets to enhance drug discovery.

View Article: PubMed Central - PubMed

Affiliation: †Molecular Materials Informatics, Inc., 1900 St. Jacques No. 302, Montreal H3J 2S1, Quebec, Canada.

ABSTRACT
On the order of hundreds of absorption, distribution, metabolism, excretion, and toxicity (ADME/Tox) models have been described in the literature in the past decade which are more often than not inaccessible to anyone but their authors. Public accessibility is also an issue with computational models for bioactivity, and the ability to share such models still remains a major challenge limiting drug discovery. We describe the creation of a reference implementation of a Bayesian model-building software module, which we have released as an open source component that is now included in the Chemistry Development Kit (CDK) project, as well as implemented in the CDD Vault and in several mobile apps. We use this implementation to build an array of Bayesian models for ADME/Tox, in vitro and in vivo bioactivity, and other physicochemical properties. We show that these models possess cross-validation receiver operator curve values comparable to those generated previously in prior publications using alternative tools. We have now described how the implementation of Bayesian models with FCFP6 descriptors generated in the CDD Vault enables the rapid production of robust machine learning models from public data or the user's own datasets. The current study sets the stage for generating models in proprietary software (such as CDD) and exporting these models in a format that could be run in open source software using CDK components. This work also demonstrates that we can enable biocomputation across distributed private or public datasets to enhance drug discovery.

No MeSH data available.


Related in: MedlinePlus

Example ofa serialized file containing a very small Bayesian model.The default file extension is .bayesian, and theMIME type is chemical/x-bayesian.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4478615&req=5

fig1: Example ofa serialized file containing a very small Bayesian model.The default file extension is .bayesian, and theMIME type is chemical/x-bayesian.

Mentions: Figure 1 shows an example of a serializedfile. The default file extension is .bayesian, andthe MIME type is chemical/x-bayesian. The text shouldbe encoded as UTF-8 unicode, for which all of the content is limitedto the ASCII subset, except for the freeform text notes. End of lineshould be encoded Unix-style, and floating point numbers can be encodedwith a decimal point (e.g., 1.23, with a period symbol for the separator,invariant of localization) or scientific notation (e.g., 1.23 ×10–9). The format is case- and whitespace-sensitive.The body of the format consists of individual lines, each of whichencodes a discrete property, and is of arbitrary length.


Open Source Bayesian Models. 1. Application to ADME/Tox and Drug Discovery Datasets.

Clark AM, Dole K, Coulon-Spektor A, McNutt A, Grass G, Freundlich JS, Reynolds RC, Ekins S - J Chem Inf Model (2015)

Example ofa serialized file containing a very small Bayesian model.The default file extension is .bayesian, and theMIME type is chemical/x-bayesian.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4478615&req=5

fig1: Example ofa serialized file containing a very small Bayesian model.The default file extension is .bayesian, and theMIME type is chemical/x-bayesian.
Mentions: Figure 1 shows an example of a serializedfile. The default file extension is .bayesian, andthe MIME type is chemical/x-bayesian. The text shouldbe encoded as UTF-8 unicode, for which all of the content is limitedto the ASCII subset, except for the freeform text notes. End of lineshould be encoded Unix-style, and floating point numbers can be encodedwith a decimal point (e.g., 1.23, with a period symbol for the separator,invariant of localization) or scientific notation (e.g., 1.23 ×10–9). The format is case- and whitespace-sensitive.The body of the format consists of individual lines, each of whichencodes a discrete property, and is of arbitrary length.

Bottom Line: We have now described how the implementation of Bayesian models with FCFP6 descriptors generated in the CDD Vault enables the rapid production of robust machine learning models from public data or the user's own datasets.The current study sets the stage for generating models in proprietary software (such as CDD) and exporting these models in a format that could be run in open source software using CDK components.This work also demonstrates that we can enable biocomputation across distributed private or public datasets to enhance drug discovery.

View Article: PubMed Central - PubMed

Affiliation: †Molecular Materials Informatics, Inc., 1900 St. Jacques No. 302, Montreal H3J 2S1, Quebec, Canada.

ABSTRACT
On the order of hundreds of absorption, distribution, metabolism, excretion, and toxicity (ADME/Tox) models have been described in the literature in the past decade which are more often than not inaccessible to anyone but their authors. Public accessibility is also an issue with computational models for bioactivity, and the ability to share such models still remains a major challenge limiting drug discovery. We describe the creation of a reference implementation of a Bayesian model-building software module, which we have released as an open source component that is now included in the Chemistry Development Kit (CDK) project, as well as implemented in the CDD Vault and in several mobile apps. We use this implementation to build an array of Bayesian models for ADME/Tox, in vitro and in vivo bioactivity, and other physicochemical properties. We show that these models possess cross-validation receiver operator curve values comparable to those generated previously in prior publications using alternative tools. We have now described how the implementation of Bayesian models with FCFP6 descriptors generated in the CDD Vault enables the rapid production of robust machine learning models from public data or the user's own datasets. The current study sets the stage for generating models in proprietary software (such as CDD) and exporting these models in a format that could be run in open source software using CDK components. This work also demonstrates that we can enable biocomputation across distributed private or public datasets to enhance drug discovery.

No MeSH data available.


Related in: MedlinePlus