Limits...
Scaling prediction errors to reward variability benefits error-driven learning in humans.

Diederen KM, Schultz W - J. Neurophysiol. (2015)

Bottom Line: In addition, participants who scaled prediction errors relative to standard deviation also presented with more similar performance for different standard deviations, indicating that increases in standard deviation did not substantially decrease "adapters'" accuracy in predicting the means of reward distributions.However, exaggerated scaling beyond the standard deviation resulted in impaired performance.Thus efficient adaptation makes learning more robust to changing variability.

View Article: PubMed Central - PubMed

Affiliation: Department of Physiology, Development, and Neuroscience, University of Cambridge, Cambridge, United Kingdom k.diederen@gmail.com.

No MeSH data available.


Related in: MedlinePlus

A: example trial of the main task. After fixation cross presentation, a small, medium, or large green bar cue signaled the (relative) fluctuation in reward value of the current distribution. After cue presentation, participants were required to indicate their prediction of the upcoming reward, after which the actual reward on that trial was shown. RT, reaction time; RPE, reward prediction error. B: reward distributions and cues indicating the degree of reward variability (cue: small, medium, or large green bar). Numbers listed under the distribution indicate the range of numbers per distribution: top, expected value (EV) 35; bottom, EV 65. C: 3 example sessions of the main task for a typical participant. In each of the 3 sessions, participants alternatingly predicted rewards drawn from 2 different distributions in small blocks of 5–8 trials, as indicated by bar cues. All participants experienced all 6 reward distributions. The order and combination of reward distributions were counterbalanced over participants. The 2 distributions in a session always had a different standard deviation (SD) and EV. D: number (N) of reward prediction errors aggregated over participants and trials. Reward prediction errors increased with SD, indicating that the experimental manipulation was successful. E: average (±SE) prediction errors (left) and performance errors (right) decreased for reward distributions with a higher EV, thus suggesting that participants perceived the drawn numbers as actual rewards.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4563025&req=5

Figure 1: A: example trial of the main task. After fixation cross presentation, a small, medium, or large green bar cue signaled the (relative) fluctuation in reward value of the current distribution. After cue presentation, participants were required to indicate their prediction of the upcoming reward, after which the actual reward on that trial was shown. RT, reaction time; RPE, reward prediction error. B: reward distributions and cues indicating the degree of reward variability (cue: small, medium, or large green bar). Numbers listed under the distribution indicate the range of numbers per distribution: top, expected value (EV) 35; bottom, EV 65. C: 3 example sessions of the main task for a typical participant. In each of the 3 sessions, participants alternatingly predicted rewards drawn from 2 different distributions in small blocks of 5–8 trials, as indicated by bar cues. All participants experienced all 6 reward distributions. The order and combination of reward distributions were counterbalanced over participants. The 2 distributions in a session always had a different standard deviation (SD) and EV. D: number (N) of reward prediction errors aggregated over participants and trials. Reward prediction errors increased with SD, indicating that the experimental manipulation was successful. E: average (±SE) prediction errors (left) and performance errors (right) decreased for reward distributions with a higher EV, thus suggesting that participants perceived the drawn numbers as actual rewards.

Mentions: The experimental task required participants to predict the magnitude of upcoming rewards as closely as possible from the past reward history. Rewards were points (i.e., numbers) drawn from six different pseudo-Gaussian distributions (SD 5, 10, or 15 and EV 35 or 65). Each trial started with a fixation cross presented on a computer monitor in front of the participants (Fig. 1A). After 500 ms of fixation cross presentation, a small, medium, or large green bar cue signaled the SD (5, 10, or 15) of the reward distribution from which the upcoming reward would be drawn (500 ms). Bar height was proportional to SD but did not correspond to the actual SD or to the range of the distributions. As such, the bar cue informed participants whether rewards were drawn from a distribution with a small (SD 5), medium (SD 10), or large (SD 15) level of variability without revealing the actual size of the SD and/or range. Thus these explicit cues facilitated rapid adaptation to reward variability. Importantly, the cues did not contain information on the EV of the distributions. After cue presentation, participants moved a horizontal bar with the numeric value displayed on both sides on a vertical scale (0–100) with a trackball mouse and indicated their prediction by a mouse click (within 3,500 ms). After a short delay (300 ms), the display showed the magnitude of the drawn reward as a green line and numbers on the same scale, as well as the reward prediction error on that trial (a yellow bar spanning the distance between the predicted and the received reward). Reward prediction error was conventionally defined as δ = reward received − reward predicted. Failure to make a timely prediction resulted in omission of the reward.


Scaling prediction errors to reward variability benefits error-driven learning in humans.

Diederen KM, Schultz W - J. Neurophysiol. (2015)

A: example trial of the main task. After fixation cross presentation, a small, medium, or large green bar cue signaled the (relative) fluctuation in reward value of the current distribution. After cue presentation, participants were required to indicate their prediction of the upcoming reward, after which the actual reward on that trial was shown. RT, reaction time; RPE, reward prediction error. B: reward distributions and cues indicating the degree of reward variability (cue: small, medium, or large green bar). Numbers listed under the distribution indicate the range of numbers per distribution: top, expected value (EV) 35; bottom, EV 65. C: 3 example sessions of the main task for a typical participant. In each of the 3 sessions, participants alternatingly predicted rewards drawn from 2 different distributions in small blocks of 5–8 trials, as indicated by bar cues. All participants experienced all 6 reward distributions. The order and combination of reward distributions were counterbalanced over participants. The 2 distributions in a session always had a different standard deviation (SD) and EV. D: number (N) of reward prediction errors aggregated over participants and trials. Reward prediction errors increased with SD, indicating that the experimental manipulation was successful. E: average (±SE) prediction errors (left) and performance errors (right) decreased for reward distributions with a higher EV, thus suggesting that participants perceived the drawn numbers as actual rewards.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4563025&req=5

Figure 1: A: example trial of the main task. After fixation cross presentation, a small, medium, or large green bar cue signaled the (relative) fluctuation in reward value of the current distribution. After cue presentation, participants were required to indicate their prediction of the upcoming reward, after which the actual reward on that trial was shown. RT, reaction time; RPE, reward prediction error. B: reward distributions and cues indicating the degree of reward variability (cue: small, medium, or large green bar). Numbers listed under the distribution indicate the range of numbers per distribution: top, expected value (EV) 35; bottom, EV 65. C: 3 example sessions of the main task for a typical participant. In each of the 3 sessions, participants alternatingly predicted rewards drawn from 2 different distributions in small blocks of 5–8 trials, as indicated by bar cues. All participants experienced all 6 reward distributions. The order and combination of reward distributions were counterbalanced over participants. The 2 distributions in a session always had a different standard deviation (SD) and EV. D: number (N) of reward prediction errors aggregated over participants and trials. Reward prediction errors increased with SD, indicating that the experimental manipulation was successful. E: average (±SE) prediction errors (left) and performance errors (right) decreased for reward distributions with a higher EV, thus suggesting that participants perceived the drawn numbers as actual rewards.
Mentions: The experimental task required participants to predict the magnitude of upcoming rewards as closely as possible from the past reward history. Rewards were points (i.e., numbers) drawn from six different pseudo-Gaussian distributions (SD 5, 10, or 15 and EV 35 or 65). Each trial started with a fixation cross presented on a computer monitor in front of the participants (Fig. 1A). After 500 ms of fixation cross presentation, a small, medium, or large green bar cue signaled the SD (5, 10, or 15) of the reward distribution from which the upcoming reward would be drawn (500 ms). Bar height was proportional to SD but did not correspond to the actual SD or to the range of the distributions. As such, the bar cue informed participants whether rewards were drawn from a distribution with a small (SD 5), medium (SD 10), or large (SD 15) level of variability without revealing the actual size of the SD and/or range. Thus these explicit cues facilitated rapid adaptation to reward variability. Importantly, the cues did not contain information on the EV of the distributions. After cue presentation, participants moved a horizontal bar with the numeric value displayed on both sides on a vertical scale (0–100) with a trackball mouse and indicated their prediction by a mouse click (within 3,500 ms). After a short delay (300 ms), the display showed the magnitude of the drawn reward as a green line and numbers on the same scale, as well as the reward prediction error on that trial (a yellow bar spanning the distance between the predicted and the received reward). Reward prediction error was conventionally defined as δ = reward received − reward predicted. Failure to make a timely prediction resulted in omission of the reward.

Bottom Line: In addition, participants who scaled prediction errors relative to standard deviation also presented with more similar performance for different standard deviations, indicating that increases in standard deviation did not substantially decrease "adapters'" accuracy in predicting the means of reward distributions.However, exaggerated scaling beyond the standard deviation resulted in impaired performance.Thus efficient adaptation makes learning more robust to changing variability.

View Article: PubMed Central - PubMed

Affiliation: Department of Physiology, Development, and Neuroscience, University of Cambridge, Cambridge, United Kingdom k.diederen@gmail.com.

No MeSH data available.


Related in: MedlinePlus