Limits...
On the ease of predicting the thermodynamic properties of beta-cyclodextrin inclusion complexes.

Steffen A, Apostolakis J - Chem Cent J (2007)

Bottom Line: We found that SVMR/FFS marginally outperforms PLSR and PCR in the prediction of DeltaG degrees, with PLSR performing slightly better than PCR.In using the methods outlined in this study, we found that DeltaS degrees appears almost unpredictable.This property, as well as the lower sensitivity of DeltaG degrees to experimental conditions, are possible explanations for its greater predictability.

View Article: PubMed Central - HTML - PubMed

Affiliation: Max-Planck-Institut für Informatik, Computational Biology and Applied Algorithmics, Saarbrücken, Germany. an.steffen@web.de

ABSTRACT

Background: In this study we investigated the predictability of three thermodynamic quantities related to complex formation. As a model system we chose the host-guest complexes of beta-cyclodextrin (beta-CD) with different guest molecules. A training dataset comprised of 176 beta-CD guest molecules with experimentally determined thermodynamic quantities was taken from the literature. We compared the performance of three different statistical regression methods - principal component regression (PCR), partial least squares regression (PLSR), and support vector machine regression combined with forward feature selection (SVMR/FSS) - with respect to their ability to generate predictive quantitative structure property relationship (QSPR) models for DeltaG degrees, DeltaH degrees and DeltaS degrees on the basis of computed molecular descriptors.

Results: We found that SVMR/FFS marginally outperforms PLSR and PCR in the prediction of DeltaG degrees, with PLSR performing slightly better than PCR. PLSR and PCR proved to be more stable in a nested cross-validation protocol. Whereas DeltaG degrees can be predicted in good agreement with experimental values, none of the methods led to comparably good predictive models for DeltaH degrees. In using the methods outlined in this study, we found that DeltaS degrees appears almost unpredictable. In order to understand the differences in the ease of predicting the quantities, we performed a detailed analysis. As a result we can show that free energies are less sensitive (than enthalpy or entropy) to the small structural variations of guest molecules. This property, as well as the lower sensitivity of DeltaG degrees to experimental conditions, are possible explanations for its greater predictability.

Conclusion: This study shows that the ease of predicting DeltaG degrees cannot be explained by the predictability of either DeltaH degrees or DeltaS degrees. Our analysis suggests that the poor predictability of TDeltaS degrees and, to a lesser extent, DeltaH degrees has to do with a stronger dependence of these quantities on the structural details of the complex and only to a lesser extent on experimental error.

No MeSH data available.


Related in: MedlinePlus

Dependence of the cross-validation coefficient q2 (TΔS°) on the number of components/descriptors integrated into a model for the inner (left column) and the outer loop (right column) of the nested-cross validation for all three methods (top to bottom: PCR, PLS, and SVM).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2228290&req=5

Figure 3: Dependence of the cross-validation coefficient q2 (TΔS°) on the number of components/descriptors integrated into a model for the inner (left column) and the outer loop (right column) of the nested-cross validation for all three methods (top to bottom: PCR, PLS, and SVM).

Mentions: The PCR model predicts the molecules' ΔG° values in the outer loop with a q2 of 0.69 ± 0.03 to the experimentally determined values, while PLSR gives a q2 of 0.69 ± 0.03 and SVMR/FFS a value of 0.71 ± 0.03 (Figure 1 and Table 2). In the case of SVMR/FFS, a drastic decrease in the outer loop's q2, in comparison to that of the inner loop, can be observed. The maximal obtained q2 value in the inner loop is 0.87, whereas in the outer loop a value of only 0.74 was found. PLSR and PCR show more stable behaviour with comparable q2 values for the inner and the outer loops. The correlations obtained for the prediction of ΔH° and TΔS° (see Figure 2 and Table 3, and Figure 3 and Table 4 respectively) are clearly below those obtained for the prediction of ΔG° with all regression methods. For both ΔH° and TΔS° none of the regression methods resulted in a q2 value of above 0.5 in the outer loop. This finding in particular highlights the risk of over-fitting the SVMR/FFS model to the data, because in the ten-fold cross-validation comparably good correlations were obtained even for ΔH° and TΔS°. The over-fitting of the SVMR/FFS model has mainly to do with the forward feature selection algorithm, which uses the squared correlation coefficient to choose the next descriptor in the iteration. Thus the execution of a nested cross validation is essential for obtaining a realistic estimate of the method's predictive ability.


On the ease of predicting the thermodynamic properties of beta-cyclodextrin inclusion complexes.

Steffen A, Apostolakis J - Chem Cent J (2007)

Dependence of the cross-validation coefficient q2 (TΔS°) on the number of components/descriptors integrated into a model for the inner (left column) and the outer loop (right column) of the nested-cross validation for all three methods (top to bottom: PCR, PLS, and SVM).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2228290&req=5

Figure 3: Dependence of the cross-validation coefficient q2 (TΔS°) on the number of components/descriptors integrated into a model for the inner (left column) and the outer loop (right column) of the nested-cross validation for all three methods (top to bottom: PCR, PLS, and SVM).
Mentions: The PCR model predicts the molecules' ΔG° values in the outer loop with a q2 of 0.69 ± 0.03 to the experimentally determined values, while PLSR gives a q2 of 0.69 ± 0.03 and SVMR/FFS a value of 0.71 ± 0.03 (Figure 1 and Table 2). In the case of SVMR/FFS, a drastic decrease in the outer loop's q2, in comparison to that of the inner loop, can be observed. The maximal obtained q2 value in the inner loop is 0.87, whereas in the outer loop a value of only 0.74 was found. PLSR and PCR show more stable behaviour with comparable q2 values for the inner and the outer loops. The correlations obtained for the prediction of ΔH° and TΔS° (see Figure 2 and Table 3, and Figure 3 and Table 4 respectively) are clearly below those obtained for the prediction of ΔG° with all regression methods. For both ΔH° and TΔS° none of the regression methods resulted in a q2 value of above 0.5 in the outer loop. This finding in particular highlights the risk of over-fitting the SVMR/FFS model to the data, because in the ten-fold cross-validation comparably good correlations were obtained even for ΔH° and TΔS°. The over-fitting of the SVMR/FFS model has mainly to do with the forward feature selection algorithm, which uses the squared correlation coefficient to choose the next descriptor in the iteration. Thus the execution of a nested cross validation is essential for obtaining a realistic estimate of the method's predictive ability.

Bottom Line: We found that SVMR/FFS marginally outperforms PLSR and PCR in the prediction of DeltaG degrees, with PLSR performing slightly better than PCR.In using the methods outlined in this study, we found that DeltaS degrees appears almost unpredictable.This property, as well as the lower sensitivity of DeltaG degrees to experimental conditions, are possible explanations for its greater predictability.

View Article: PubMed Central - HTML - PubMed

Affiliation: Max-Planck-Institut für Informatik, Computational Biology and Applied Algorithmics, Saarbrücken, Germany. an.steffen@web.de

ABSTRACT

Background: In this study we investigated the predictability of three thermodynamic quantities related to complex formation. As a model system we chose the host-guest complexes of beta-cyclodextrin (beta-CD) with different guest molecules. A training dataset comprised of 176 beta-CD guest molecules with experimentally determined thermodynamic quantities was taken from the literature. We compared the performance of three different statistical regression methods - principal component regression (PCR), partial least squares regression (PLSR), and support vector machine regression combined with forward feature selection (SVMR/FSS) - with respect to their ability to generate predictive quantitative structure property relationship (QSPR) models for DeltaG degrees, DeltaH degrees and DeltaS degrees on the basis of computed molecular descriptors.

Results: We found that SVMR/FFS marginally outperforms PLSR and PCR in the prediction of DeltaG degrees, with PLSR performing slightly better than PCR. PLSR and PCR proved to be more stable in a nested cross-validation protocol. Whereas DeltaG degrees can be predicted in good agreement with experimental values, none of the methods led to comparably good predictive models for DeltaH degrees. In using the methods outlined in this study, we found that DeltaS degrees appears almost unpredictable. In order to understand the differences in the ease of predicting the quantities, we performed a detailed analysis. As a result we can show that free energies are less sensitive (than enthalpy or entropy) to the small structural variations of guest molecules. This property, as well as the lower sensitivity of DeltaG degrees to experimental conditions, are possible explanations for its greater predictability.

Conclusion: This study shows that the ease of predicting DeltaG degrees cannot be explained by the predictability of either DeltaH degrees or DeltaS degrees. Our analysis suggests that the poor predictability of TDeltaS degrees and, to a lesser extent, DeltaH degrees has to do with a stronger dependence of these quantities on the structural details of the complex and only to a lesser extent on experimental error.

No MeSH data available.


Related in: MedlinePlus