Limits...
Shifting responsibly: the importance of striatal modularity to reinforcement learning in uncertain environments.

Amemori K, Gibb LG, Graybiel AM - Front Hum Neurosci (2011)

Bottom Line: We then constructed a network model of basal ganglia circuitry that includes these modules and the direct and indirect pathways.Based on simple assumptions, this model suggests that while the direct pathway may promote actions based on striatal action values, the indirect pathway may act as a gating network that facilitates or suppresses behavioral modules on the basis of striatal responsibility signals.Our modeling functionally unites the modular compartmental organization of the striatum with the direct-indirect pathway divisions of the basal ganglia, a step that we suggest will have important clinical implications.

View Article: PubMed Central - PubMed

Affiliation: McGovern Institute for Brain Research, Massachusetts Institute of Technology Cambridge, MA, USA.

ABSTRACT
We propose here that the modular organization of the striatum reflects a context-sensitive modular learning architecture in which clustered striosome-matrisome domains participate in modular reinforcement learning (RL). Based on anatomical and physiological evidence, it has been suggested that the modular organization of the striatum could represent a learning architecture. There is not, however, a coherent view of how such a learning architecture could relate to the organization of striatal outputs into the direct and indirect pathways of the basal ganglia, nor a clear formulation of how such a modular architecture relates to the RL functions attributed to the striatum. Here, we hypothesize that striosome-matrisome modules not only learn to bias behavior toward specific actions, as in standard RL, but also learn to assess their own relevance to the environmental context and modulate their own learning and activity on this basis. We further hypothesize that the contextual relevance or "responsibility" of modules is determined by errors in predictions of environmental features and that such responsibility is assigned by striosomes and conveyed to matrisomes via local circuit interneurons. To examine these hypotheses and to identify the general requirements for realizing this architecture in the nervous system, we developed a simple modular RL model. We then constructed a network model of basal ganglia circuitry that includes these modules and the direct and indirect pathways. Based on simple assumptions, this model suggests that while the direct pathway may promote actions based on striatal action values, the indirect pathway may act as a gating network that facilitates or suppresses behavioral modules on the basis of striatal responsibility signals. Our modeling functionally unites the modular compartmental organization of the striatum with the direct-indirect pathway divisions of the basal ganglia, a step that we suggest will have important clinical implications.

No MeSH data available.


Schematic diagram of modular reinforcement learning (RL) model. Each module m produces a responsibility λm and a policy πm. Responsibility λm is calculated based on the accumulated squared prediction error Γm, which in turn is based on a comparison of a prediction pm of a feature of the environment with the actual feature (in this case the reward Rt). The module with greater λm is selected based on the softmax selection rule ρ, which can be seen as a description of a gating network. The modular policy πm assigns each module's probability of choosing each candidate action based on the modular action-value function Qm. The policy πm of the selected module determines the actual action at. The learning or updating of pm and Qm is performed only within the selected module using the global reward signal Rt.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3105240&req=5

Figure 1: Schematic diagram of modular reinforcement learning (RL) model. Each module m produces a responsibility λm and a policy πm. Responsibility λm is calculated based on the accumulated squared prediction error Γm, which in turn is based on a comparison of a prediction pm of a feature of the environment with the actual feature (in this case the reward Rt). The module with greater λm is selected based on the softmax selection rule ρ, which can be seen as a description of a gating network. The modular policy πm assigns each module's probability of choosing each candidate action based on the modular action-value function Qm. The policy πm of the selected module determines the actual action at. The learning or updating of pm and Qm is performed only within the selected module using the global reward signal Rt.

Mentions: We begin by introducing a simple RL module that learns actions appropriate to a given context (Figure 1). Each module contains a set of action selection policies updated by an RL architecture (Sutton and Barto, 1998; Doya et al., 2002). The model uses a Markov decision process in which state is s (e.g., the location of the agent, s = 1–14 in our simulation), action is a (e.g., moving right or left in our simulation), and immediate reward is Rt at time t. Each RL module m consists of an action-value function Qm(s, a) that represents the value of an action taken from a specific state st, and a prediction model, which generates a prediction of a certain feature of the environment based on the state st of the agent (e.g., whether there is reward or not at state st in our simulation). For each module, we also introduce a state value function Vm(s), which represents the value of each state, st. There are two “decisions” that the model must make. Firstly, it must decide which module to select, based on its knowledge of the environment. Secondly, the chosen module must decide which action to choose, based on the value of each action. This structure allows the model to specialize its modules for various environments or contexts, so that the agent can rapidly switch strategies in a changing world. For the action selection policy, we adopted the softmax rule, which assigns the probability of choosing each candidate action as: πm(s, a) = exp(βQm(s, a))/Σa′exp(βQm(s, a′)), where β > 0 is a parameter controlling the randomness of the choices.


Shifting responsibly: the importance of striatal modularity to reinforcement learning in uncertain environments.

Amemori K, Gibb LG, Graybiel AM - Front Hum Neurosci (2011)

Schematic diagram of modular reinforcement learning (RL) model. Each module m produces a responsibility λm and a policy πm. Responsibility λm is calculated based on the accumulated squared prediction error Γm, which in turn is based on a comparison of a prediction pm of a feature of the environment with the actual feature (in this case the reward Rt). The module with greater λm is selected based on the softmax selection rule ρ, which can be seen as a description of a gating network. The modular policy πm assigns each module's probability of choosing each candidate action based on the modular action-value function Qm. The policy πm of the selected module determines the actual action at. The learning or updating of pm and Qm is performed only within the selected module using the global reward signal Rt.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3105240&req=5

Figure 1: Schematic diagram of modular reinforcement learning (RL) model. Each module m produces a responsibility λm and a policy πm. Responsibility λm is calculated based on the accumulated squared prediction error Γm, which in turn is based on a comparison of a prediction pm of a feature of the environment with the actual feature (in this case the reward Rt). The module with greater λm is selected based on the softmax selection rule ρ, which can be seen as a description of a gating network. The modular policy πm assigns each module's probability of choosing each candidate action based on the modular action-value function Qm. The policy πm of the selected module determines the actual action at. The learning or updating of pm and Qm is performed only within the selected module using the global reward signal Rt.
Mentions: We begin by introducing a simple RL module that learns actions appropriate to a given context (Figure 1). Each module contains a set of action selection policies updated by an RL architecture (Sutton and Barto, 1998; Doya et al., 2002). The model uses a Markov decision process in which state is s (e.g., the location of the agent, s = 1–14 in our simulation), action is a (e.g., moving right or left in our simulation), and immediate reward is Rt at time t. Each RL module m consists of an action-value function Qm(s, a) that represents the value of an action taken from a specific state st, and a prediction model, which generates a prediction of a certain feature of the environment based on the state st of the agent (e.g., whether there is reward or not at state st in our simulation). For each module, we also introduce a state value function Vm(s), which represents the value of each state, st. There are two “decisions” that the model must make. Firstly, it must decide which module to select, based on its knowledge of the environment. Secondly, the chosen module must decide which action to choose, based on the value of each action. This structure allows the model to specialize its modules for various environments or contexts, so that the agent can rapidly switch strategies in a changing world. For the action selection policy, we adopted the softmax rule, which assigns the probability of choosing each candidate action as: πm(s, a) = exp(βQm(s, a))/Σa′exp(βQm(s, a′)), where β > 0 is a parameter controlling the randomness of the choices.

Bottom Line: We then constructed a network model of basal ganglia circuitry that includes these modules and the direct and indirect pathways.Based on simple assumptions, this model suggests that while the direct pathway may promote actions based on striatal action values, the indirect pathway may act as a gating network that facilitates or suppresses behavioral modules on the basis of striatal responsibility signals.Our modeling functionally unites the modular compartmental organization of the striatum with the direct-indirect pathway divisions of the basal ganglia, a step that we suggest will have important clinical implications.

View Article: PubMed Central - PubMed

Affiliation: McGovern Institute for Brain Research, Massachusetts Institute of Technology Cambridge, MA, USA.

ABSTRACT
We propose here that the modular organization of the striatum reflects a context-sensitive modular learning architecture in which clustered striosome-matrisome domains participate in modular reinforcement learning (RL). Based on anatomical and physiological evidence, it has been suggested that the modular organization of the striatum could represent a learning architecture. There is not, however, a coherent view of how such a learning architecture could relate to the organization of striatal outputs into the direct and indirect pathways of the basal ganglia, nor a clear formulation of how such a modular architecture relates to the RL functions attributed to the striatum. Here, we hypothesize that striosome-matrisome modules not only learn to bias behavior toward specific actions, as in standard RL, but also learn to assess their own relevance to the environmental context and modulate their own learning and activity on this basis. We further hypothesize that the contextual relevance or "responsibility" of modules is determined by errors in predictions of environmental features and that such responsibility is assigned by striosomes and conveyed to matrisomes via local circuit interneurons. To examine these hypotheses and to identify the general requirements for realizing this architecture in the nervous system, we developed a simple modular RL model. We then constructed a network model of basal ganglia circuitry that includes these modules and the direct and indirect pathways. Based on simple assumptions, this model suggests that while the direct pathway may promote actions based on striatal action values, the indirect pathway may act as a gating network that facilitates or suppresses behavioral modules on the basis of striatal responsibility signals. Our modeling functionally unites the modular compartmental organization of the striatum with the direct-indirect pathway divisions of the basal ganglia, a step that we suggest will have important clinical implications.

No MeSH data available.