Limits...
Shifting responsibly: the importance of striatal modularity to reinforcement learning in uncertain environments.

Amemori K, Gibb LG, Graybiel AM - Front Hum Neurosci (2011)

Bottom Line: We then constructed a network model of basal ganglia circuitry that includes these modules and the direct and indirect pathways.Based on simple assumptions, this model suggests that while the direct pathway may promote actions based on striatal action values, the indirect pathway may act as a gating network that facilitates or suppresses behavioral modules on the basis of striatal responsibility signals.Our modeling functionally unites the modular compartmental organization of the striatum with the direct-indirect pathway divisions of the basal ganglia, a step that we suggest will have important clinical implications.

View Article: PubMed Central - PubMed

Affiliation: McGovern Institute for Brain Research, Massachusetts Institute of Technology Cambridge, MA, USA.

ABSTRACT
We propose here that the modular organization of the striatum reflects a context-sensitive modular learning architecture in which clustered striosome-matrisome domains participate in modular reinforcement learning (RL). Based on anatomical and physiological evidence, it has been suggested that the modular organization of the striatum could represent a learning architecture. There is not, however, a coherent view of how such a learning architecture could relate to the organization of striatal outputs into the direct and indirect pathways of the basal ganglia, nor a clear formulation of how such a modular architecture relates to the RL functions attributed to the striatum. Here, we hypothesize that striosome-matrisome modules not only learn to bias behavior toward specific actions, as in standard RL, but also learn to assess their own relevance to the environmental context and modulate their own learning and activity on this basis. We further hypothesize that the contextual relevance or "responsibility" of modules is determined by errors in predictions of environmental features and that such responsibility is assigned by striosomes and conveyed to matrisomes via local circuit interneurons. To examine these hypotheses and to identify the general requirements for realizing this architecture in the nervous system, we developed a simple modular RL model. We then constructed a network model of basal ganglia circuitry that includes these modules and the direct and indirect pathways. Based on simple assumptions, this model suggests that while the direct pathway may promote actions based on striatal action values, the indirect pathway may act as a gating network that facilitates or suppresses behavioral modules on the basis of striatal responsibility signals. Our modeling functionally unites the modular compartmental organization of the striatum with the direct-indirect pathway divisions of the basal ganglia, a step that we suggest will have important clinical implications.

No MeSH data available.


Dynamically changing grid world explored by modular RL model. The agent (circle) can move either left or right, and when the agent receives reward, it is returned to the center (s = 7). The reward is placed at either s = 1 or s = 14, and this location alternates every 2500 time steps. When the agent reaches the unrewarded terminal, it is moved one position back (from s = 14 to s = 13 in Env. B and from s = 1 to s = 2 in Env. A). Each of the model's two modules becomes specialized by modular RL to maximize the agent's accumulated reward in one of the two versions of the environment (Env. A and Env. B) defined by reward location.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3105240&req=5

Figure 2: Dynamically changing grid world explored by modular RL model. The agent (circle) can move either left or right, and when the agent receives reward, it is returned to the center (s = 7). The reward is placed at either s = 1 or s = 14, and this location alternates every 2500 time steps. When the agent reaches the unrewarded terminal, it is moved one position back (from s = 14 to s = 13 in Env. B and from s = 1 to s = 2 in Env. A). Each of the model's two modules becomes specialized by modular RL to maximize the agent's accumulated reward in one of the two versions of the environment (Env. A and Env. B) defined by reward location.

Mentions: To demonstrate how the model works, we constructed a toy example of RL (Figure 2). The agent exists in a one-dimensional grid world, in which the only action choices are “go left” or “go right.” This environment has a starting position and several positions to the left and to the right of the starting position. The agent can inhabit either of two versions of the environment. One version has a reward at the end position on one side, whereas the other version has a reward at the end position on the other side. This reward location is switched after every 2500 time steps. When the agent reaches an end position (s = 1 or s = 14) not containing a reward, it is bounced one position back (s = 2 or 13). If the agent obtains reward, it is returned back to the center position (s = 7). For simplicity, we fixed the standard RL parameters to standard values: β = 1, γ = 0.8, and φ = κ = 0.1 (cf. Sutton and Barto, 1998). We set the newly introduced modular RL parameters to α = 20, σ = 1, τ = 10, and η = 0.05.


Shifting responsibly: the importance of striatal modularity to reinforcement learning in uncertain environments.

Amemori K, Gibb LG, Graybiel AM - Front Hum Neurosci (2011)

Dynamically changing grid world explored by modular RL model. The agent (circle) can move either left or right, and when the agent receives reward, it is returned to the center (s = 7). The reward is placed at either s = 1 or s = 14, and this location alternates every 2500 time steps. When the agent reaches the unrewarded terminal, it is moved one position back (from s = 14 to s = 13 in Env. B and from s = 1 to s = 2 in Env. A). Each of the model's two modules becomes specialized by modular RL to maximize the agent's accumulated reward in one of the two versions of the environment (Env. A and Env. B) defined by reward location.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3105240&req=5

Figure 2: Dynamically changing grid world explored by modular RL model. The agent (circle) can move either left or right, and when the agent receives reward, it is returned to the center (s = 7). The reward is placed at either s = 1 or s = 14, and this location alternates every 2500 time steps. When the agent reaches the unrewarded terminal, it is moved one position back (from s = 14 to s = 13 in Env. B and from s = 1 to s = 2 in Env. A). Each of the model's two modules becomes specialized by modular RL to maximize the agent's accumulated reward in one of the two versions of the environment (Env. A and Env. B) defined by reward location.
Mentions: To demonstrate how the model works, we constructed a toy example of RL (Figure 2). The agent exists in a one-dimensional grid world, in which the only action choices are “go left” or “go right.” This environment has a starting position and several positions to the left and to the right of the starting position. The agent can inhabit either of two versions of the environment. One version has a reward at the end position on one side, whereas the other version has a reward at the end position on the other side. This reward location is switched after every 2500 time steps. When the agent reaches an end position (s = 1 or s = 14) not containing a reward, it is bounced one position back (s = 2 or 13). If the agent obtains reward, it is returned back to the center position (s = 7). For simplicity, we fixed the standard RL parameters to standard values: β = 1, γ = 0.8, and φ = κ = 0.1 (cf. Sutton and Barto, 1998). We set the newly introduced modular RL parameters to α = 20, σ = 1, τ = 10, and η = 0.05.

Bottom Line: We then constructed a network model of basal ganglia circuitry that includes these modules and the direct and indirect pathways.Based on simple assumptions, this model suggests that while the direct pathway may promote actions based on striatal action values, the indirect pathway may act as a gating network that facilitates or suppresses behavioral modules on the basis of striatal responsibility signals.Our modeling functionally unites the modular compartmental organization of the striatum with the direct-indirect pathway divisions of the basal ganglia, a step that we suggest will have important clinical implications.

View Article: PubMed Central - PubMed

Affiliation: McGovern Institute for Brain Research, Massachusetts Institute of Technology Cambridge, MA, USA.

ABSTRACT
We propose here that the modular organization of the striatum reflects a context-sensitive modular learning architecture in which clustered striosome-matrisome domains participate in modular reinforcement learning (RL). Based on anatomical and physiological evidence, it has been suggested that the modular organization of the striatum could represent a learning architecture. There is not, however, a coherent view of how such a learning architecture could relate to the organization of striatal outputs into the direct and indirect pathways of the basal ganglia, nor a clear formulation of how such a modular architecture relates to the RL functions attributed to the striatum. Here, we hypothesize that striosome-matrisome modules not only learn to bias behavior toward specific actions, as in standard RL, but also learn to assess their own relevance to the environmental context and modulate their own learning and activity on this basis. We further hypothesize that the contextual relevance or "responsibility" of modules is determined by errors in predictions of environmental features and that such responsibility is assigned by striosomes and conveyed to matrisomes via local circuit interneurons. To examine these hypotheses and to identify the general requirements for realizing this architecture in the nervous system, we developed a simple modular RL model. We then constructed a network model of basal ganglia circuitry that includes these modules and the direct and indirect pathways. Based on simple assumptions, this model suggests that while the direct pathway may promote actions based on striatal action values, the indirect pathway may act as a gating network that facilitates or suppresses behavioral modules on the basis of striatal responsibility signals. Our modeling functionally unites the modular compartmental organization of the striatum with the direct-indirect pathway divisions of the basal ganglia, a step that we suggest will have important clinical implications.

No MeSH data available.