Limits...
Prediction of 1-octanol solubilities using data from the Open Notebook Science Challenge.

Buonaiuto MA, Lang AS - Chem Cent J (2015)

Bottom Line: The model has been deployed for general use as a Shiny application.The 1-octanol solubility model provides reasonably accurate predictions of the 1-octanol solubility of organic solutes directly from structure.The model was developed under Open Notebook Science conditions which makes it open, reproducible, and as useful as possible.Graphical abstract.

View Article: PubMed Central - PubMed

Affiliation: Department of Computing and Mathematics, Oral Roberts University, 7777 S. Lewis Avenue, Tulsa, OK 74171 USA.

ABSTRACT

Background: 1-Octanol solubility is important in a variety of applications involving pharmacology and environmental chemistry. Current models are linear in nature and often require foreknowledge of either melting point or aqueous solubility. Here we extend the range of applicability of 1-octanol solubility models by creating a random forest model that can predict 1-octanol solubilities directly from structure.

Results: We created a random forest model using CDK descriptors that has an out-of-bag (OOB) R(2) value of 0.66 and an OOB mean squared error of 0.34. The model has been deployed for general use as a Shiny application.

Conclusion: The 1-octanol solubility model provides reasonably accurate predictions of the 1-octanol solubility of organic solutes directly from structure. The model was developed under Open Notebook Science conditions which makes it open, reproducible, and as useful as possible.Graphical abstract.

No MeSH data available.


Random forest model variable importance
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4585410&req=5

Fig6: Random forest model variable importance

Mentions: The following descriptors were identified as important: ALogP, XLogP, TopoPSA,nAtomP, MDEC.23, khs.aaCH, and nHBAcc, see Fig. 6, whichcorrespond to two models for LogP, the predicted topological polar surface area, the number of atomsin the longest pi chain, the MDE topological descriptor, a Kier and Hall smarts descriptor, and thenumber of hydrogen bond acceptors respectively. It is not surprising that both ALogP and XLogP wouldbe important in predicting 1-octanol solubility, though one would have assumed that one of thesedescriptors would have been removed during feature selection as being highly correlated with theother. Analyzing the correlation between these two descriptors, we see that they are correlated at0.83 and they both survived as are cutoff was at 0.90. This further confirms the problems withcurrent Open LogP descriptors implemented in the CDK [16].Fig. 6


Prediction of 1-octanol solubilities using data from the Open Notebook Science Challenge.

Buonaiuto MA, Lang AS - Chem Cent J (2015)

Random forest model variable importance
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4585410&req=5

Fig6: Random forest model variable importance
Mentions: The following descriptors were identified as important: ALogP, XLogP, TopoPSA,nAtomP, MDEC.23, khs.aaCH, and nHBAcc, see Fig. 6, whichcorrespond to two models for LogP, the predicted topological polar surface area, the number of atomsin the longest pi chain, the MDE topological descriptor, a Kier and Hall smarts descriptor, and thenumber of hydrogen bond acceptors respectively. It is not surprising that both ALogP and XLogP wouldbe important in predicting 1-octanol solubility, though one would have assumed that one of thesedescriptors would have been removed during feature selection as being highly correlated with theother. Analyzing the correlation between these two descriptors, we see that they are correlated at0.83 and they both survived as are cutoff was at 0.90. This further confirms the problems withcurrent Open LogP descriptors implemented in the CDK [16].Fig. 6

Bottom Line: The model has been deployed for general use as a Shiny application.The 1-octanol solubility model provides reasonably accurate predictions of the 1-octanol solubility of organic solutes directly from structure.The model was developed under Open Notebook Science conditions which makes it open, reproducible, and as useful as possible.Graphical abstract.

View Article: PubMed Central - PubMed

Affiliation: Department of Computing and Mathematics, Oral Roberts University, 7777 S. Lewis Avenue, Tulsa, OK 74171 USA.

ABSTRACT

Background: 1-Octanol solubility is important in a variety of applications involving pharmacology and environmental chemistry. Current models are linear in nature and often require foreknowledge of either melting point or aqueous solubility. Here we extend the range of applicability of 1-octanol solubility models by creating a random forest model that can predict 1-octanol solubilities directly from structure.

Results: We created a random forest model using CDK descriptors that has an out-of-bag (OOB) R(2) value of 0.66 and an OOB mean squared error of 0.34. The model has been deployed for general use as a Shiny application.

Conclusion: The 1-octanol solubility model provides reasonably accurate predictions of the 1-octanol solubility of organic solutes directly from structure. The model was developed under Open Notebook Science conditions which makes it open, reproducible, and as useful as possible.Graphical abstract.

No MeSH data available.