Limits...
Human neural learning depends on reward prediction errors in the blocking paradigm.

Tobler PN, O'doherty JP, Dolan RJ, Schultz W - J. Neurophysiol. (2005)

Bottom Line: Here, a novel stimulus is blocked from learning when it is associated with a fully predicted outcome, presumably because the occurrence of the outcome fails to produce a prediction error.The medial orbitofrontal cortex and the ventral putamen showed significantly lower responses to blocked, compared with nonblocked, reward-predicting stimuli.These data suggest that learning in primary reward structures in the human brain correlates with prediction errors in a manner that complies with principles of formal learning theory.

View Article: PubMed Central - PubMed

Affiliation: Department of Anatomy, University of Cambridge, Cambridge CB2 3DY, UK. pnt21@cam.ac.uk

ABSTRACT
Learning occurs when an outcome deviates from expectation (prediction error). According to formal learning theory, the defining paradigm demonstrating the role of prediction errors in learning is the blocking test. Here, a novel stimulus is blocked from learning when it is associated with a fully predicted outcome, presumably because the occurrence of the outcome fails to produce a prediction error. We investigated the role of prediction errors in human reward-directed learning using a blocking paradigm and measured brain activation with functional magnetic resonance imaging. Participants showed blocking of behavioral learning with juice rewards as predicted by learning theory. The medial orbitofrontal cortex and the ventral putamen showed significantly lower responses to blocked, compared with nonblocked, reward-predicting stimuli. In reward-predicting control situations, deactivation in orbitofrontal cortex and ventral putamen occurred at the time of unpredicted reward omissions. Responses in discrete parts of orbitofrontal cortex correlated with the degree of behavioral learning during, and after, the learning phase. These data suggest that learning in primary reward structures in the human brain correlates with prediction errors in a manner that complies with principles of formal learning theory.

Show MeSH

Related in: MedlinePlus

Differential reward-prediction-related activity at the time of stimuli Y− and X− and of omitted reward in the ventral putamen during the blocking experiment. A: region of interest analysis. Left: 10-mm sphere centered on a striatal voxel shown previously to report prediction errors (O’Doherty et al. 2004). Right: bar plots show contrast estimates (dimensionless) corresponding to the average fit for X− and Y− when modeled with an activation at the time of conditioned stimuli and a deactivation at the usual time of reward within region of interest shown left. B: event-related time courses to X− and Y−, averaged within the region of interest shown in A and across subjects. Stimuli occurred at time = 0, were displayed for 3 s, and not followed by reward. Intertrial intervals varied between 3 and 11 s according to a Poisson distribution with a mean of 6 s. All 6 trial types alternated randomly, and scan onsets occurred randomly with regard to stimulus onsets. In this and all following time course plots, the hemodynamic response is plotted as percentage signal change and occurs with the usual lag of ∼5 s. Error bars correspond to SE. To characterize the shape of the responses in time course plots, regressors have been convolved with finite impulse responses rather than hemodynamic response functions. The bin size corresponded to the scan repetition time (3.6 s). C: conjunction of reward-predicting stimulus A+ vs. neutral stimulus B− and reward-predicting stimulus Y− vs. blocked stimulus X−. This analysis was restricted to subjects showing blocking behaviorally.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2637603&req=5

f2: Differential reward-prediction-related activity at the time of stimuli Y− and X− and of omitted reward in the ventral putamen during the blocking experiment. A: region of interest analysis. Left: 10-mm sphere centered on a striatal voxel shown previously to report prediction errors (O’Doherty et al. 2004). Right: bar plots show contrast estimates (dimensionless) corresponding to the average fit for X− and Y− when modeled with an activation at the time of conditioned stimuli and a deactivation at the usual time of reward within region of interest shown left. B: event-related time courses to X− and Y−, averaged within the region of interest shown in A and across subjects. Stimuli occurred at time = 0, were displayed for 3 s, and not followed by reward. Intertrial intervals varied between 3 and 11 s according to a Poisson distribution with a mean of 6 s. All 6 trial types alternated randomly, and scan onsets occurred randomly with regard to stimulus onsets. In this and all following time course plots, the hemodynamic response is plotted as percentage signal change and occurs with the usual lag of ∼5 s. Error bars correspond to SE. To characterize the shape of the responses in time course plots, regressors have been convolved with finite impulse responses rather than hemodynamic response functions. The bin size corresponded to the scan repetition time (3.6 s). C: conjunction of reward-predicting stimulus A+ vs. neutral stimulus B− and reward-predicting stimulus Y− vs. blocked stimulus X−. This analysis was restricted to subjects showing blocking behaviorally.

Mentions: We tested differential blocking of learning by modeling neural responses to control stimulus Y− and blocked stimulus X− with a phasic positive response at the time of the stimulus and a negative prediction-error response at the time of the omitted reward. We performed a region of interest (ROI) analysis in 15 subjects showing behavioral blocking by measuring the activation in a 10-mm sphere centered on two previously reported peaks of reward-prediction-error responses in the ventral putamen (O’Doherty et al. 2003, 2004). Activations were stronger for Y− compared with X− (paired t-test, both P < 0.05, small volume correction; Fig. 2A) and failed to correlate with movement parameters (/r/ for all parameters <0.53 and P for all >0.12). In a conjunction analysis, we found that the right ventral putamen (27/3/−6; z = 3.61) was more activated by control stimulus Y− than by blocked stimulus X− and likewise more by reward-predicting stimulus A+ than by neutral stimulus B− (Fig. 2C; Table 1, top, for additional activations). These data suggest that activation in the ventral putamen was blocked together with behavioral learning in the absence of a reward-prediction error.


Human neural learning depends on reward prediction errors in the blocking paradigm.

Tobler PN, O'doherty JP, Dolan RJ, Schultz W - J. Neurophysiol. (2005)

Differential reward-prediction-related activity at the time of stimuli Y− and X− and of omitted reward in the ventral putamen during the blocking experiment. A: region of interest analysis. Left: 10-mm sphere centered on a striatal voxel shown previously to report prediction errors (O’Doherty et al. 2004). Right: bar plots show contrast estimates (dimensionless) corresponding to the average fit for X− and Y− when modeled with an activation at the time of conditioned stimuli and a deactivation at the usual time of reward within region of interest shown left. B: event-related time courses to X− and Y−, averaged within the region of interest shown in A and across subjects. Stimuli occurred at time = 0, were displayed for 3 s, and not followed by reward. Intertrial intervals varied between 3 and 11 s according to a Poisson distribution with a mean of 6 s. All 6 trial types alternated randomly, and scan onsets occurred randomly with regard to stimulus onsets. In this and all following time course plots, the hemodynamic response is plotted as percentage signal change and occurs with the usual lag of ∼5 s. Error bars correspond to SE. To characterize the shape of the responses in time course plots, regressors have been convolved with finite impulse responses rather than hemodynamic response functions. The bin size corresponded to the scan repetition time (3.6 s). C: conjunction of reward-predicting stimulus A+ vs. neutral stimulus B− and reward-predicting stimulus Y− vs. blocked stimulus X−. This analysis was restricted to subjects showing blocking behaviorally.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2637603&req=5

f2: Differential reward-prediction-related activity at the time of stimuli Y− and X− and of omitted reward in the ventral putamen during the blocking experiment. A: region of interest analysis. Left: 10-mm sphere centered on a striatal voxel shown previously to report prediction errors (O’Doherty et al. 2004). Right: bar plots show contrast estimates (dimensionless) corresponding to the average fit for X− and Y− when modeled with an activation at the time of conditioned stimuli and a deactivation at the usual time of reward within region of interest shown left. B: event-related time courses to X− and Y−, averaged within the region of interest shown in A and across subjects. Stimuli occurred at time = 0, were displayed for 3 s, and not followed by reward. Intertrial intervals varied between 3 and 11 s according to a Poisson distribution with a mean of 6 s. All 6 trial types alternated randomly, and scan onsets occurred randomly with regard to stimulus onsets. In this and all following time course plots, the hemodynamic response is plotted as percentage signal change and occurs with the usual lag of ∼5 s. Error bars correspond to SE. To characterize the shape of the responses in time course plots, regressors have been convolved with finite impulse responses rather than hemodynamic response functions. The bin size corresponded to the scan repetition time (3.6 s). C: conjunction of reward-predicting stimulus A+ vs. neutral stimulus B− and reward-predicting stimulus Y− vs. blocked stimulus X−. This analysis was restricted to subjects showing blocking behaviorally.
Mentions: We tested differential blocking of learning by modeling neural responses to control stimulus Y− and blocked stimulus X− with a phasic positive response at the time of the stimulus and a negative prediction-error response at the time of the omitted reward. We performed a region of interest (ROI) analysis in 15 subjects showing behavioral blocking by measuring the activation in a 10-mm sphere centered on two previously reported peaks of reward-prediction-error responses in the ventral putamen (O’Doherty et al. 2003, 2004). Activations were stronger for Y− compared with X− (paired t-test, both P < 0.05, small volume correction; Fig. 2A) and failed to correlate with movement parameters (/r/ for all parameters <0.53 and P for all >0.12). In a conjunction analysis, we found that the right ventral putamen (27/3/−6; z = 3.61) was more activated by control stimulus Y− than by blocked stimulus X− and likewise more by reward-predicting stimulus A+ than by neutral stimulus B− (Fig. 2C; Table 1, top, for additional activations). These data suggest that activation in the ventral putamen was blocked together with behavioral learning in the absence of a reward-prediction error.

Bottom Line: Here, a novel stimulus is blocked from learning when it is associated with a fully predicted outcome, presumably because the occurrence of the outcome fails to produce a prediction error.The medial orbitofrontal cortex and the ventral putamen showed significantly lower responses to blocked, compared with nonblocked, reward-predicting stimuli.These data suggest that learning in primary reward structures in the human brain correlates with prediction errors in a manner that complies with principles of formal learning theory.

View Article: PubMed Central - PubMed

Affiliation: Department of Anatomy, University of Cambridge, Cambridge CB2 3DY, UK. pnt21@cam.ac.uk

ABSTRACT
Learning occurs when an outcome deviates from expectation (prediction error). According to formal learning theory, the defining paradigm demonstrating the role of prediction errors in learning is the blocking test. Here, a novel stimulus is blocked from learning when it is associated with a fully predicted outcome, presumably because the occurrence of the outcome fails to produce a prediction error. We investigated the role of prediction errors in human reward-directed learning using a blocking paradigm and measured brain activation with functional magnetic resonance imaging. Participants showed blocking of behavioral learning with juice rewards as predicted by learning theory. The medial orbitofrontal cortex and the ventral putamen showed significantly lower responses to blocked, compared with nonblocked, reward-predicting stimuli. In reward-predicting control situations, deactivation in orbitofrontal cortex and ventral putamen occurred at the time of unpredicted reward omissions. Responses in discrete parts of orbitofrontal cortex correlated with the degree of behavioral learning during, and after, the learning phase. These data suggest that learning in primary reward structures in the human brain correlates with prediction errors in a manner that complies with principles of formal learning theory.

Show MeSH
Related in: MedlinePlus