Limits...
Prediction of 1-octanol solubilities using data from the Open Notebook Science Challenge.

Buonaiuto MA, Lang AS - Chem Cent J (2015)

Bottom Line: The model has been deployed for general use as a Shiny application.The 1-octanol solubility model provides reasonably accurate predictions of the 1-octanol solubility of organic solutes directly from structure.The model was developed under Open Notebook Science conditions which makes it open, reproducible, and as useful as possible.Graphical abstract.

View Article: PubMed Central - PubMed

Affiliation: Department of Computing and Mathematics, Oral Roberts University, 7777 S. Lewis Avenue, Tulsa, OK 74171 USA.

ABSTRACT

Background: 1-Octanol solubility is important in a variety of applications involving pharmacology and environmental chemistry. Current models are linear in nature and often require foreknowledge of either melting point or aqueous solubility. Here we extend the range of applicability of 1-octanol solubility models by creating a random forest model that can predict 1-octanol solubilities directly from structure.

Results: We created a random forest model using CDK descriptors that has an out-of-bag (OOB) R(2) value of 0.66 and an OOB mean squared error of 0.34. The model has been deployed for general use as a Shiny application.

Conclusion: The 1-octanol solubility model provides reasonably accurate predictions of the 1-octanol solubility of organic solutes directly from structure. The model was developed under Open Notebook Science conditions which makes it open, reproducible, and as useful as possible.Graphical abstract.

No MeSH data available.


Predicted vs. measured solubility values for the randomly selected test-set coloured byAE
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4585410&req=5

Fig5: Predicted vs. measured solubility values for the randomly selected test-set coloured byAE

Mentions: The dataset was then split randomly into training and test sets (75:25). Using therandom forest model package (v 4.6-10) in R (v 3.1.2), we created a random forest model using ourtraining set data. This model had an OOB R2 value of 0.63 and an OOB MSEof 0.38. This model was then used to predict the 1-octanol solubilities of the compounds in thetest-set resulting in and R2 value of 0.54 and a MSE of 0.44, seeFig. 5. The performance statistics obtained when using themodel to predict test-set solubilities are comparable to the OOB values. The fact that they areslightly smaller may be an artifact of the relatively small sizes of the training and test sets andthe fact that we decided to doing a single taining-set/test-set split rather than usecross-validation.Fig. 5


Prediction of 1-octanol solubilities using data from the Open Notebook Science Challenge.

Buonaiuto MA, Lang AS - Chem Cent J (2015)

Predicted vs. measured solubility values for the randomly selected test-set coloured byAE
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4585410&req=5

Fig5: Predicted vs. measured solubility values for the randomly selected test-set coloured byAE
Mentions: The dataset was then split randomly into training and test sets (75:25). Using therandom forest model package (v 4.6-10) in R (v 3.1.2), we created a random forest model using ourtraining set data. This model had an OOB R2 value of 0.63 and an OOB MSEof 0.38. This model was then used to predict the 1-octanol solubilities of the compounds in thetest-set resulting in and R2 value of 0.54 and a MSE of 0.44, seeFig. 5. The performance statistics obtained when using themodel to predict test-set solubilities are comparable to the OOB values. The fact that they areslightly smaller may be an artifact of the relatively small sizes of the training and test sets andthe fact that we decided to doing a single taining-set/test-set split rather than usecross-validation.Fig. 5

Bottom Line: The model has been deployed for general use as a Shiny application.The 1-octanol solubility model provides reasonably accurate predictions of the 1-octanol solubility of organic solutes directly from structure.The model was developed under Open Notebook Science conditions which makes it open, reproducible, and as useful as possible.Graphical abstract.

View Article: PubMed Central - PubMed

Affiliation: Department of Computing and Mathematics, Oral Roberts University, 7777 S. Lewis Avenue, Tulsa, OK 74171 USA.

ABSTRACT

Background: 1-Octanol solubility is important in a variety of applications involving pharmacology and environmental chemistry. Current models are linear in nature and often require foreknowledge of either melting point or aqueous solubility. Here we extend the range of applicability of 1-octanol solubility models by creating a random forest model that can predict 1-octanol solubilities directly from structure.

Results: We created a random forest model using CDK descriptors that has an out-of-bag (OOB) R(2) value of 0.66 and an OOB mean squared error of 0.34. The model has been deployed for general use as a Shiny application.

Conclusion: The 1-octanol solubility model provides reasonably accurate predictions of the 1-octanol solubility of organic solutes directly from structure. The model was developed under Open Notebook Science conditions which makes it open, reproducible, and as useful as possible.Graphical abstract.

No MeSH data available.