Limits...
Prediction of 1-octanol solubilities using data from the Open Notebook Science Challenge.

Buonaiuto MA, Lang AS - Chem Cent J (2015)

Bottom Line: The model has been deployed for general use as a Shiny application.The 1-octanol solubility model provides reasonably accurate predictions of the 1-octanol solubility of organic solutes directly from structure.The model was developed under Open Notebook Science conditions which makes it open, reproducible, and as useful as possible.Graphical abstract.

View Article: PubMed Central - PubMed

Affiliation: Department of Computing and Mathematics, Oral Roberts University, 7777 S. Lewis Avenue, Tulsa, OK 74171 USA.

ABSTRACT

Background: 1-Octanol solubility is important in a variety of applications involving pharmacology and environmental chemistry. Current models are linear in nature and often require foreknowledge of either melting point or aqueous solubility. Here we extend the range of applicability of 1-octanol solubility models by creating a random forest model that can predict 1-octanol solubilities directly from structure.

Results: We created a random forest model using CDK descriptors that has an out-of-bag (OOB) R(2) value of 0.66 and an OOB mean squared error of 0.34. The model has been deployed for general use as a Shiny application.

Conclusion: The 1-octanol solubility model provides reasonably accurate predictions of the 1-octanol solubility of organic solutes directly from structure. The model was developed under Open Notebook Science conditions which makes it open, reproducible, and as useful as possible.Graphical abstract.

No MeSH data available.


Chemical space of compounds naturally separate into two distinct clusters
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4585410&req=5

Fig4: Chemical space of compounds naturally separate into two distinct clusters

Mentions: Principal component analysis (using the prcompfunction with scale = T) and cluster analysis was performed on thedataset of 259 compounds with 86 CDK descriptors using R. The optimal number of clusters wasdetermined to be 2 by using silhouette analysis (using the pamfunction) on a series ranging from 2 to 20 clusters. The silhouettes had an average width of 0.74for 2 clusters; almost double the next closest value [10]. The clusters are shown in Fig. 4 belowwith the x and y axes corresponding to the first and second principal components respectively. Thefirst two principal components explain 36 % of the variance. The first cluster (red) is typified bycompounds without hydrogen bond acceptors and with ALogP >1.56 and with TopoPSA <26.48; 128out of 157 compounds match this criteria. The blue cluster is more chemically diverse than the redcluster but even so 75 of the 102 compounds have ALogP <1.56 and TopoPSA >26.48 and at leastone hydrogen bond acceptor.Fig. 4


Prediction of 1-octanol solubilities using data from the Open Notebook Science Challenge.

Buonaiuto MA, Lang AS - Chem Cent J (2015)

Chemical space of compounds naturally separate into two distinct clusters
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4585410&req=5

Fig4: Chemical space of compounds naturally separate into two distinct clusters
Mentions: Principal component analysis (using the prcompfunction with scale = T) and cluster analysis was performed on thedataset of 259 compounds with 86 CDK descriptors using R. The optimal number of clusters wasdetermined to be 2 by using silhouette analysis (using the pamfunction) on a series ranging from 2 to 20 clusters. The silhouettes had an average width of 0.74for 2 clusters; almost double the next closest value [10]. The clusters are shown in Fig. 4 belowwith the x and y axes corresponding to the first and second principal components respectively. Thefirst two principal components explain 36 % of the variance. The first cluster (red) is typified bycompounds without hydrogen bond acceptors and with ALogP >1.56 and with TopoPSA <26.48; 128out of 157 compounds match this criteria. The blue cluster is more chemically diverse than the redcluster but even so 75 of the 102 compounds have ALogP <1.56 and TopoPSA >26.48 and at leastone hydrogen bond acceptor.Fig. 4

Bottom Line: The model has been deployed for general use as a Shiny application.The 1-octanol solubility model provides reasonably accurate predictions of the 1-octanol solubility of organic solutes directly from structure.The model was developed under Open Notebook Science conditions which makes it open, reproducible, and as useful as possible.Graphical abstract.

View Article: PubMed Central - PubMed

Affiliation: Department of Computing and Mathematics, Oral Roberts University, 7777 S. Lewis Avenue, Tulsa, OK 74171 USA.

ABSTRACT

Background: 1-Octanol solubility is important in a variety of applications involving pharmacology and environmental chemistry. Current models are linear in nature and often require foreknowledge of either melting point or aqueous solubility. Here we extend the range of applicability of 1-octanol solubility models by creating a random forest model that can predict 1-octanol solubilities directly from structure.

Results: We created a random forest model using CDK descriptors that has an out-of-bag (OOB) R(2) value of 0.66 and an OOB mean squared error of 0.34. The model has been deployed for general use as a Shiny application.

Conclusion: The 1-octanol solubility model provides reasonably accurate predictions of the 1-octanol solubility of organic solutes directly from structure. The model was developed under Open Notebook Science conditions which makes it open, reproducible, and as useful as possible.Graphical abstract.

No MeSH data available.