Limits...
Dopamine-mediated learning and switching in cortico-striatal circuit explain behavioral changes in reinforcement learning.

Hong S, Hikosaka O - Front Behav Neurosci (2011)

Bottom Line: This learning mechanism can explain the robust behavioral adaptation observed in a location-reward-value-association task where the animal makes shorter latency saccades to reward locations.The changes in saccade latency become quicker as the monkey becomes more experienced.This behavior can be explained by a switching mechanism which activates the cortico-striatal circuit selectively.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Sensorimotor Research, National Eye Institute, National Institutes of Health Bethesda, MD, USA.

ABSTRACT
The basal ganglia are thought to play a crucial role in reinforcement learning. Central to the learning mechanism are dopamine (DA) D1 and D2 receptors located in the cortico-striatal synapses. However, it is still unclear how this DA-mediated synaptic plasticity is deployed and coordinated during reward-contingent behavioral changes. Here we propose a computational model of reinforcement learning that uses different thresholds of D1- and D2-mediated synaptic plasticity which are antagonized by DA-independent synaptic plasticity. A phasic increase in DA release caused by a larger-than-expected reward induces long-term potentiation (LTP) in the direct pathway, whereas a phasic decrease in DA release caused by a smaller-than-expected reward induces a cessation of long-term depression, leading to LTP in the indirect pathway. This learning mechanism can explain the robust behavioral adaptation observed in a location-reward-value-association task where the animal makes shorter latency saccades to reward locations. The changes in saccade latency become quicker as the monkey becomes more experienced. This behavior can be explained by a switching mechanism which activates the cortico-striatal circuit selectively. Our model also shows how D1- or D2-receptor blocking experiments affect selectively either reward or no-reward trials. The proposed mechanisms also explain the behavioral changes in Parkinson's disease.

No MeSH data available.


Related in: MedlinePlus

Influence of D2 antagonist on saccadic latency. (A) Trial-by-trial changes in the latency of contralateral saccades, before (black) and after (blue) injection of a D2 antagonist into the CD. Data are from Nakamura and Hikosaka (2006). (B) Simulated trial-by-trial changes in saccade latency. (C) After D1 antagonist injection, average saccade latency increased in small-reward trials, but not in big-reward trials. The experimental data was replicated by computer simulation. (D) Hypothesized mechanism of the effect of D2 antagonist in big-reward trials. The phasic increase in the DA level exceeds both the D1 and D2 thresholds, although the D2 antagonist elevates the D2 threshold, and therefore the saccade latency remains largely unchanged. (E) Hypothesized mechanism of the effect of D2 antagonist in small-reward trials. The elevated D2 threshold eliminates the DA-dependent LTD in the indirect pathway MSNs and therefore potentiates the SNr-induced inhibition of the SC, leading to a longer latency saccade.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3065164&req=5

Figure 6: Influence of D2 antagonist on saccadic latency. (A) Trial-by-trial changes in the latency of contralateral saccades, before (black) and after (blue) injection of a D2 antagonist into the CD. Data are from Nakamura and Hikosaka (2006). (B) Simulated trial-by-trial changes in saccade latency. (C) After D1 antagonist injection, average saccade latency increased in small-reward trials, but not in big-reward trials. The experimental data was replicated by computer simulation. (D) Hypothesized mechanism of the effect of D2 antagonist in big-reward trials. The phasic increase in the DA level exceeds both the D1 and D2 thresholds, although the D2 antagonist elevates the D2 threshold, and therefore the saccade latency remains largely unchanged. (E) Hypothesized mechanism of the effect of D2 antagonist in small-reward trials. The elevated D2 threshold eliminates the DA-dependent LTD in the indirect pathway MSNs and therefore potentiates the SNr-induced inhibition of the SC, leading to a longer latency saccade.

Mentions: In contrast, after the D2 antagonist injection in the CD, the saccadic latency increased selectively in no-reward trials (Nakamura and Hikosaka, 2006). Our model explains this change as a consequence of the increased threshold for the D2 receptor activation (Figure 6E). Note that in the normal condition, the threshold, of the D2 receptors, is assumed to be below the level of DA concentration in the striatum (Figure 2D). After the injection of the D2 antagonist, the D2 threshold increases significantly while the D1 threshold remains unchanged, which leads to selective changes in the indirect pathway. This change will not grossly affect saccades in reward trials because DA concentration is assumed to exceed both D1 and D2 thresholds (Figure 6D). In no-reward trials, however, the change in the D2 threshold affects processes in the indirect pathway (compare Figure 6E with Figure 2D). This is because the removal of DA-dependent LTD enhances the activity of indirect pathway MSNs. The increased output in the indirect pathway leads to an increase in the SNr-induced inhibition on SC saccadic neurons, leading to longer saccade latencies, as shown in the simulated results in Figure 6C right. These results are similar to the experimental data (Figure 6C, left). The simulation also replicates the trial-by-trial changes in saccade latencies and their alteration by D2 antagonist (compare Figure 6A with Figure 6B).


Dopamine-mediated learning and switching in cortico-striatal circuit explain behavioral changes in reinforcement learning.

Hong S, Hikosaka O - Front Behav Neurosci (2011)

Influence of D2 antagonist on saccadic latency. (A) Trial-by-trial changes in the latency of contralateral saccades, before (black) and after (blue) injection of a D2 antagonist into the CD. Data are from Nakamura and Hikosaka (2006). (B) Simulated trial-by-trial changes in saccade latency. (C) After D1 antagonist injection, average saccade latency increased in small-reward trials, but not in big-reward trials. The experimental data was replicated by computer simulation. (D) Hypothesized mechanism of the effect of D2 antagonist in big-reward trials. The phasic increase in the DA level exceeds both the D1 and D2 thresholds, although the D2 antagonist elevates the D2 threshold, and therefore the saccade latency remains largely unchanged. (E) Hypothesized mechanism of the effect of D2 antagonist in small-reward trials. The elevated D2 threshold eliminates the DA-dependent LTD in the indirect pathway MSNs and therefore potentiates the SNr-induced inhibition of the SC, leading to a longer latency saccade.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3065164&req=5

Figure 6: Influence of D2 antagonist on saccadic latency. (A) Trial-by-trial changes in the latency of contralateral saccades, before (black) and after (blue) injection of a D2 antagonist into the CD. Data are from Nakamura and Hikosaka (2006). (B) Simulated trial-by-trial changes in saccade latency. (C) After D1 antagonist injection, average saccade latency increased in small-reward trials, but not in big-reward trials. The experimental data was replicated by computer simulation. (D) Hypothesized mechanism of the effect of D2 antagonist in big-reward trials. The phasic increase in the DA level exceeds both the D1 and D2 thresholds, although the D2 antagonist elevates the D2 threshold, and therefore the saccade latency remains largely unchanged. (E) Hypothesized mechanism of the effect of D2 antagonist in small-reward trials. The elevated D2 threshold eliminates the DA-dependent LTD in the indirect pathway MSNs and therefore potentiates the SNr-induced inhibition of the SC, leading to a longer latency saccade.
Mentions: In contrast, after the D2 antagonist injection in the CD, the saccadic latency increased selectively in no-reward trials (Nakamura and Hikosaka, 2006). Our model explains this change as a consequence of the increased threshold for the D2 receptor activation (Figure 6E). Note that in the normal condition, the threshold, of the D2 receptors, is assumed to be below the level of DA concentration in the striatum (Figure 2D). After the injection of the D2 antagonist, the D2 threshold increases significantly while the D1 threshold remains unchanged, which leads to selective changes in the indirect pathway. This change will not grossly affect saccades in reward trials because DA concentration is assumed to exceed both D1 and D2 thresholds (Figure 6D). In no-reward trials, however, the change in the D2 threshold affects processes in the indirect pathway (compare Figure 6E with Figure 2D). This is because the removal of DA-dependent LTD enhances the activity of indirect pathway MSNs. The increased output in the indirect pathway leads to an increase in the SNr-induced inhibition on SC saccadic neurons, leading to longer saccade latencies, as shown in the simulated results in Figure 6C right. These results are similar to the experimental data (Figure 6C, left). The simulation also replicates the trial-by-trial changes in saccade latencies and their alteration by D2 antagonist (compare Figure 6A with Figure 6B).

Bottom Line: This learning mechanism can explain the robust behavioral adaptation observed in a location-reward-value-association task where the animal makes shorter latency saccades to reward locations.The changes in saccade latency become quicker as the monkey becomes more experienced.This behavior can be explained by a switching mechanism which activates the cortico-striatal circuit selectively.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Sensorimotor Research, National Eye Institute, National Institutes of Health Bethesda, MD, USA.

ABSTRACT
The basal ganglia are thought to play a crucial role in reinforcement learning. Central to the learning mechanism are dopamine (DA) D1 and D2 receptors located in the cortico-striatal synapses. However, it is still unclear how this DA-mediated synaptic plasticity is deployed and coordinated during reward-contingent behavioral changes. Here we propose a computational model of reinforcement learning that uses different thresholds of D1- and D2-mediated synaptic plasticity which are antagonized by DA-independent synaptic plasticity. A phasic increase in DA release caused by a larger-than-expected reward induces long-term potentiation (LTP) in the direct pathway, whereas a phasic decrease in DA release caused by a smaller-than-expected reward induces a cessation of long-term depression, leading to LTP in the indirect pathway. This learning mechanism can explain the robust behavioral adaptation observed in a location-reward-value-association task where the animal makes shorter latency saccades to reward locations. The changes in saccade latency become quicker as the monkey becomes more experienced. This behavior can be explained by a switching mechanism which activates the cortico-striatal circuit selectively. Our model also shows how D1- or D2-receptor blocking experiments affect selectively either reward or no-reward trials. The proposed mechanisms also explain the behavioral changes in Parkinson's disease.

No MeSH data available.


Related in: MedlinePlus