Scaling prediction errors to reward variability benefits error-driven learning in humans.
In addition, participants who scaled prediction errors relative to standard deviation also presented with more similar performance for different standard deviations, indicating that increases in standard deviation did not substantially decrease "adapters'" accuracy in predicting the means of reward distributions.However, exaggerated scaling beyond the standard deviation resulted in impaired performance.Thus efficient adaptation makes learning more robust to changing variability.
Affiliation: Department of Physiology, Development, and Neuroscience, University of Cambridge, Cambridge, United Kingdom email@example.com.
No MeSH data available.
© Copyright Policy
Figure 2: A: simulated overall performance error (/performance âˆ’ EV/ averaged over all trials) for the Pearce-Hall model (see text for details on the simulation). Each line represents performance error across different learning rates for a specific decay in learning rate (y-axis). Grayscale lines represent different gradual decays in learning rate (Î·; 0â€“1, in steps of 0.1). Lighter grays indicate increases in learning rate decay. When the decay in learning rate is 0, the Pearce-Hall (PH) model is equivalent to the Rescorla-Wagner (RW) model. Performance error depends on both the initial learning rate (x-axis) and the gradual decay in learning rate. For most (initial) learning rates performance error is lower when combined with a decaying rather than a constant learning rate. B: optimal initial learning rates for SDs of 1, 5, 10, 15, and 20 and a decay of 0.1. Optimal initial learning rate was quantified as the initial learning rate for which best overall performance could be achieved. The optimal initial learning rate decreases when SD increases. Each black line indicates performance error across different learning rates and represents a specific SD. Red dots indicate the optimal learning rate for each SD. C: optimal learning rates (gray dots and line) for different SDs (SD 1â€“SD 20) and multiple decays in learning rate (0, 0.1, 0.4, and 0.9). Optimal learning rates decrease when SD increases for each level of decay. Red dots correspond with the red dots in B, i.e., the optimal initial learning rates for SD 5, 10, 15, and 20 with a decay of 0.1. D: simulated overall performance error for the adaptive Pearce-Hall model where prediction errors are scaled relative to the logarithm of SD (Eq. 4; Î½ = 0.5). Grayscale lines represent different gradual decays in learning rate (Î·; 0â€“1, in steps of 0.1). Lighter grays indicate increases in learning rate decay. Although the minimum performance error is lower in the adaptive compared with the nonadaptive Pearce-Hall model (compare red dots in A and D), performance also critically depends on the initial learning rate and the gradual decrease in learning rate (compare blue dots in A and D). Thus performance may, but does not necessarily, improve with adaptation. E: simulated predictions with the nonadaptive (top) and adaptive Pearce-Hall model (bottom) for distributions with an SD of 5, 10, and 15, an initial learning rate of 0.5, and a gradual decay in learning rate of 0.1. Lines represent average of 200 simulated sessions. Shaded areas indicate SE. Adaptation facilitates faster learning and more similar performance error across SD conditions. F: relation between the degree of adaptation (Eq. 4; prediction errors scaled with the logarithm of SD) and performance error. Whereas scaling of prediction errors relative to but smaller than (log)SD facilitates decreases in performance error, scaling with a magnitude close to the (log)SD may limit the power of the learning rate to update predictions, resulting in increases in performance error. Thus performance may, but does not necessarily, improve with adaptation.
Thus far, it is unclear whether scaling of prediction errors relative to the variability of reward distributions results in improved performance, as predicted by learning models (Preuschoff and Bossaerts 2007). Increases in computational demands during prediction error scaling may, for instance, impede optimal deceleration of learning rates, resulting in suboptimal performance. In addition, although scaling of prediction errors relative to the variability in reward benefits performance, scaling with the standard deviation (SD) limits the power of the learning rate to update predictions. For instance, when a prediction error of 15 is divided by an SD of 15, the prediction can only be adjusted with 1 point (see Fig. 2F).