Limits...
Dopamine-mediated learning and switching in cortico-striatal circuit explain behavioral changes in reinforcement learning.

Hong S, Hikosaka O - Front Behav Neurosci (2011)

Bottom Line: This learning mechanism can explain the robust behavioral adaptation observed in a location-reward-value-association task where the animal makes shorter latency saccades to reward locations.The changes in saccade latency become quicker as the monkey becomes more experienced.This behavior can be explained by a switching mechanism which activates the cortico-striatal circuit selectively.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Sensorimotor Research, National Eye Institute, National Institutes of Health Bethesda, MD, USA.

ABSTRACT
The basal ganglia are thought to play a crucial role in reinforcement learning. Central to the learning mechanism are dopamine (DA) D1 and D2 receptors located in the cortico-striatal synapses. However, it is still unclear how this DA-mediated synaptic plasticity is deployed and coordinated during reward-contingent behavioral changes. Here we propose a computational model of reinforcement learning that uses different thresholds of D1- and D2-mediated synaptic plasticity which are antagonized by DA-independent synaptic plasticity. A phasic increase in DA release caused by a larger-than-expected reward induces long-term potentiation (LTP) in the direct pathway, whereas a phasic decrease in DA release caused by a smaller-than-expected reward induces a cessation of long-term depression, leading to LTP in the indirect pathway. This learning mechanism can explain the robust behavioral adaptation observed in a location-reward-value-association task where the animal makes shorter latency saccades to reward locations. The changes in saccade latency become quicker as the monkey becomes more experienced. This behavior can be explained by a switching mechanism which activates the cortico-striatal circuit selectively. Our model also shows how D1- or D2-receptor blocking experiments affect selectively either reward or no-reward trials. The proposed mechanisms also explain the behavioral changes in Parkinson's disease.

No MeSH data available.


Related in: MedlinePlus

Experience-dependent emergence of a switching mechanism that allows rapid changes of saccade latency in response to the change in reward location: before (A–C) and after (D–F) sufficient experience of the 1DR task. We hypothesize the presence of “reward-category neurons” (RWD), a key driver of the switching, that have excitatory connections to FEF neurons and direct pathway MSNs in the CD in the same hemisphere. They would become active before target onset selectively when a reward is expected on the contralateral side (see Figure 4), an assumption based on experimental observations of neuronal activity in the FEF, CD, SNr, and SC. Before sufficient experience of the 1DR task (A–C), the saccade latency changes gradually in both the small-to-big-reward transition [red in (B,C)] and the big-to-small-reward transition [blue in (B,C)] similarly by experimental observation (B) and computer simulation (C). The saccade latency data in (B) is from monkeys C, D, and T. After sufficient experience of the 1DR task (D–F), the saccade latency changes quickly as shown in experiments (E) and computer simulation (F). This is mainly due to the additional excitatory input from the reward-category neurons. Note, however, that the decrease in saccade latency in the small-to-big-reward transition [red in (E,F)] is quicker than the increase in saccade latency in the big-to-small-reward transition [blue in (E,F)]. This asymmetry is due to the asymmetric learning algorithm operated by two parallel circuits in the basal ganglia illustrated in Figure 2. Figure (E) from Matsumoto and Hikosaka (2007).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3065164&req=5

Figure 3: Experience-dependent emergence of a switching mechanism that allows rapid changes of saccade latency in response to the change in reward location: before (A–C) and after (D–F) sufficient experience of the 1DR task. We hypothesize the presence of “reward-category neurons” (RWD), a key driver of the switching, that have excitatory connections to FEF neurons and direct pathway MSNs in the CD in the same hemisphere. They would become active before target onset selectively when a reward is expected on the contralateral side (see Figure 4), an assumption based on experimental observations of neuronal activity in the FEF, CD, SNr, and SC. Before sufficient experience of the 1DR task (A–C), the saccade latency changes gradually in both the small-to-big-reward transition [red in (B,C)] and the big-to-small-reward transition [blue in (B,C)] similarly by experimental observation (B) and computer simulation (C). The saccade latency data in (B) is from monkeys C, D, and T. After sufficient experience of the 1DR task (D–F), the saccade latency changes quickly as shown in experiments (E) and computer simulation (F). This is mainly due to the additional excitatory input from the reward-category neurons. Note, however, that the decrease in saccade latency in the small-to-big-reward transition [red in (E,F)] is quicker than the increase in saccade latency in the big-to-small-reward transition [blue in (E,F)]. This asymmetry is due to the asymmetric learning algorithm operated by two parallel circuits in the basal ganglia illustrated in Figure 2. Figure (E) from Matsumoto and Hikosaka (2007).

Mentions: While the monkey was performing 1DR task, the latency was consistently shorter for the saccade to reward target than for the saccade to no-reward target. Such a bias evolved gradually becoming more apparent as trials progressed (Figures 1B and 3B,E). The slow change in saccadic latency was particularly evident initially (Figure 3B). After experiencing 1DR task extensively the monkey became able to switch the bias rapidly (Figure 3E; Takikawa et al., 2004).


Dopamine-mediated learning and switching in cortico-striatal circuit explain behavioral changes in reinforcement learning.

Hong S, Hikosaka O - Front Behav Neurosci (2011)

Experience-dependent emergence of a switching mechanism that allows rapid changes of saccade latency in response to the change in reward location: before (A–C) and after (D–F) sufficient experience of the 1DR task. We hypothesize the presence of “reward-category neurons” (RWD), a key driver of the switching, that have excitatory connections to FEF neurons and direct pathway MSNs in the CD in the same hemisphere. They would become active before target onset selectively when a reward is expected on the contralateral side (see Figure 4), an assumption based on experimental observations of neuronal activity in the FEF, CD, SNr, and SC. Before sufficient experience of the 1DR task (A–C), the saccade latency changes gradually in both the small-to-big-reward transition [red in (B,C)] and the big-to-small-reward transition [blue in (B,C)] similarly by experimental observation (B) and computer simulation (C). The saccade latency data in (B) is from monkeys C, D, and T. After sufficient experience of the 1DR task (D–F), the saccade latency changes quickly as shown in experiments (E) and computer simulation (F). This is mainly due to the additional excitatory input from the reward-category neurons. Note, however, that the decrease in saccade latency in the small-to-big-reward transition [red in (E,F)] is quicker than the increase in saccade latency in the big-to-small-reward transition [blue in (E,F)]. This asymmetry is due to the asymmetric learning algorithm operated by two parallel circuits in the basal ganglia illustrated in Figure 2. Figure (E) from Matsumoto and Hikosaka (2007).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3065164&req=5

Figure 3: Experience-dependent emergence of a switching mechanism that allows rapid changes of saccade latency in response to the change in reward location: before (A–C) and after (D–F) sufficient experience of the 1DR task. We hypothesize the presence of “reward-category neurons” (RWD), a key driver of the switching, that have excitatory connections to FEF neurons and direct pathway MSNs in the CD in the same hemisphere. They would become active before target onset selectively when a reward is expected on the contralateral side (see Figure 4), an assumption based on experimental observations of neuronal activity in the FEF, CD, SNr, and SC. Before sufficient experience of the 1DR task (A–C), the saccade latency changes gradually in both the small-to-big-reward transition [red in (B,C)] and the big-to-small-reward transition [blue in (B,C)] similarly by experimental observation (B) and computer simulation (C). The saccade latency data in (B) is from monkeys C, D, and T. After sufficient experience of the 1DR task (D–F), the saccade latency changes quickly as shown in experiments (E) and computer simulation (F). This is mainly due to the additional excitatory input from the reward-category neurons. Note, however, that the decrease in saccade latency in the small-to-big-reward transition [red in (E,F)] is quicker than the increase in saccade latency in the big-to-small-reward transition [blue in (E,F)]. This asymmetry is due to the asymmetric learning algorithm operated by two parallel circuits in the basal ganglia illustrated in Figure 2. Figure (E) from Matsumoto and Hikosaka (2007).
Mentions: While the monkey was performing 1DR task, the latency was consistently shorter for the saccade to reward target than for the saccade to no-reward target. Such a bias evolved gradually becoming more apparent as trials progressed (Figures 1B and 3B,E). The slow change in saccadic latency was particularly evident initially (Figure 3B). After experiencing 1DR task extensively the monkey became able to switch the bias rapidly (Figure 3E; Takikawa et al., 2004).

Bottom Line: This learning mechanism can explain the robust behavioral adaptation observed in a location-reward-value-association task where the animal makes shorter latency saccades to reward locations.The changes in saccade latency become quicker as the monkey becomes more experienced.This behavior can be explained by a switching mechanism which activates the cortico-striatal circuit selectively.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Sensorimotor Research, National Eye Institute, National Institutes of Health Bethesda, MD, USA.

ABSTRACT
The basal ganglia are thought to play a crucial role in reinforcement learning. Central to the learning mechanism are dopamine (DA) D1 and D2 receptors located in the cortico-striatal synapses. However, it is still unclear how this DA-mediated synaptic plasticity is deployed and coordinated during reward-contingent behavioral changes. Here we propose a computational model of reinforcement learning that uses different thresholds of D1- and D2-mediated synaptic plasticity which are antagonized by DA-independent synaptic plasticity. A phasic increase in DA release caused by a larger-than-expected reward induces long-term potentiation (LTP) in the direct pathway, whereas a phasic decrease in DA release caused by a smaller-than-expected reward induces a cessation of long-term depression, leading to LTP in the indirect pathway. This learning mechanism can explain the robust behavioral adaptation observed in a location-reward-value-association task where the animal makes shorter latency saccades to reward locations. The changes in saccade latency become quicker as the monkey becomes more experienced. This behavior can be explained by a switching mechanism which activates the cortico-striatal circuit selectively. Our model also shows how D1- or D2-receptor blocking experiments affect selectively either reward or no-reward trials. The proposed mechanisms also explain the behavioral changes in Parkinson's disease.

No MeSH data available.


Related in: MedlinePlus