Limits...
Parallel Representation of Value-Based and Finite State-Based Strategies in the Ventral and Dorsal Striatum.

Ito M, Doya K - PLoS Comput. Biol. (2015)

Bottom Line: Analyses of neural activity recorded from the dorsolateral striatum (DLS), the dorsomedial striatum (DMS), and the ventral striatum (VS) identified significant fractions of neurons in all three subareas for which activities were correlated with individual states of the finite state-based strategy.The signal of internal states at the time of choice was found in DMS, and for clusters of states was found in VS.These results suggest that both the value-based strategy and the finite state-based strategy are implemented in the striatum.

View Article: PubMed Central - PubMed

Affiliation: Okinawa Institute of Science and Technology Graduate University, Onna-son Okinawa, Japan.

ABSTRACT
Previous theoretical studies of animal and human behavioral learning have focused on the dichotomy of the value-based strategy using action value functions to predict rewards and the model-based strategy using internal models to predict environmental states. However, animals and humans often take simple procedural behaviors, such as the "win-stay, lose-switch" strategy without explicit prediction of rewards or states. Here we consider another strategy, the finite state-based strategy, in which a subject selects an action depending on its discrete internal state and updates the state depending on the action chosen and the reward outcome. By analyzing choice behavior of rats in a free-choice task, we found that the finite state-based strategy fitted their behavioral choices more accurately than value-based and model-based strategies did. When fitted models were run autonomously with the same task, only the finite state-based strategy could reproduce the key feature of choice sequences. Analyses of neural activity recorded from the dorsolateral striatum (DLS), the dorsomedial striatum (DMS), and the ventral striatum (VS) identified significant fractions of neurons in all three subareas for which activities were correlated with individual states of the finite state-based strategy. The signal of internal states at the time of choice was found in DMS, and for clusters of states was found in VS. In addition, action values and state values of the value-based strategy were encoded in DMS and VS, respectively. These results suggest that both the value-based strategy and the finite state-based strategy are implemented in the striatum.

No MeSH data available.


Estimated parameters of finite state agent (FSA) models.Each state is represented by a blue or red circle, and numbers in circles represent indices of state and action probabilities (%) for left and right. States for which the probability of left (right) is larger than that of right (left) are shown in blue (red). Each arrow with a number indicates the transition probability (%) after left (blue) or right (red) is chosen and a reward is obtained (solid) or not obtained (dashed). For simplicity, only transition probabilities greater than 5% are shown. These parameters were estimated under symmetric constraints. States form clusters that represent different sub-strategies (cluster left, cluster right, and win-stay, lose-switch). See Materials and Methods for the mathematical definition of the clusters. (A) FSA model with 4 states, (B) 6 states, and (C) 8 states.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4631489&req=5

pcbi.1004540.g004: Estimated parameters of finite state agent (FSA) models.Each state is represented by a blue or red circle, and numbers in circles represent indices of state and action probabilities (%) for left and right. States for which the probability of left (right) is larger than that of right (left) are shown in blue (red). Each arrow with a number indicates the transition probability (%) after left (blue) or right (red) is chosen and a reward is obtained (solid) or not obtained (dashed). For simplicity, only transition probabilities greater than 5% are shown. These parameters were estimated under symmetric constraints. States form clusters that represent different sub-strategies (cluster left, cluster right, and win-stay, lose-switch). See Materials and Methods for the mathematical definition of the clusters. (A) FSA model with 4 states, (B) 6 states, and (C) 8 states.

Mentions: (A) An example of behavioral performance and predictions made by the models. Vertical black lines indicate rat choice behavior. Left and right choices are represented by upper and lower bars, respectively. Rewarded and non-rewarded outcomes are represented by long and short bars, respectively. Model fits, representing the prediction probability that the rat selects left at trial t, were estimated using previous choices and the reward outcomes from trial 1 to t-1 based on the FQ-learning or the FSA model with 8 states. These are represented by red or green lines, respectively. (B) Estimated action values and varying parameters of the FQ-learning model and standard deviations of posterior probabilities. QL and QR, action values for left and right; α, the learning rate for the selected action (= forgetting rate for the action not chosen); κ1, the strength of reinforcement by reward; and κ2, the strength of the aversion resulting from the no-reward outcome. (C) Posterior probabilities of internal states (upper panel) and clusters (lower panel) of the FSA model with 8 states shown by stacked graphs. The index of states and clusters corresponds to the index in Fig 4C.


Parallel Representation of Value-Based and Finite State-Based Strategies in the Ventral and Dorsal Striatum.

Ito M, Doya K - PLoS Comput. Biol. (2015)

Estimated parameters of finite state agent (FSA) models.Each state is represented by a blue or red circle, and numbers in circles represent indices of state and action probabilities (%) for left and right. States for which the probability of left (right) is larger than that of right (left) are shown in blue (red). Each arrow with a number indicates the transition probability (%) after left (blue) or right (red) is chosen and a reward is obtained (solid) or not obtained (dashed). For simplicity, only transition probabilities greater than 5% are shown. These parameters were estimated under symmetric constraints. States form clusters that represent different sub-strategies (cluster left, cluster right, and win-stay, lose-switch). See Materials and Methods for the mathematical definition of the clusters. (A) FSA model with 4 states, (B) 6 states, and (C) 8 states.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4631489&req=5

pcbi.1004540.g004: Estimated parameters of finite state agent (FSA) models.Each state is represented by a blue or red circle, and numbers in circles represent indices of state and action probabilities (%) for left and right. States for which the probability of left (right) is larger than that of right (left) are shown in blue (red). Each arrow with a number indicates the transition probability (%) after left (blue) or right (red) is chosen and a reward is obtained (solid) or not obtained (dashed). For simplicity, only transition probabilities greater than 5% are shown. These parameters were estimated under symmetric constraints. States form clusters that represent different sub-strategies (cluster left, cluster right, and win-stay, lose-switch). See Materials and Methods for the mathematical definition of the clusters. (A) FSA model with 4 states, (B) 6 states, and (C) 8 states.
Mentions: (A) An example of behavioral performance and predictions made by the models. Vertical black lines indicate rat choice behavior. Left and right choices are represented by upper and lower bars, respectively. Rewarded and non-rewarded outcomes are represented by long and short bars, respectively. Model fits, representing the prediction probability that the rat selects left at trial t, were estimated using previous choices and the reward outcomes from trial 1 to t-1 based on the FQ-learning or the FSA model with 8 states. These are represented by red or green lines, respectively. (B) Estimated action values and varying parameters of the FQ-learning model and standard deviations of posterior probabilities. QL and QR, action values for left and right; α, the learning rate for the selected action (= forgetting rate for the action not chosen); κ1, the strength of reinforcement by reward; and κ2, the strength of the aversion resulting from the no-reward outcome. (C) Posterior probabilities of internal states (upper panel) and clusters (lower panel) of the FSA model with 8 states shown by stacked graphs. The index of states and clusters corresponds to the index in Fig 4C.

Bottom Line: Analyses of neural activity recorded from the dorsolateral striatum (DLS), the dorsomedial striatum (DMS), and the ventral striatum (VS) identified significant fractions of neurons in all three subareas for which activities were correlated with individual states of the finite state-based strategy.The signal of internal states at the time of choice was found in DMS, and for clusters of states was found in VS.These results suggest that both the value-based strategy and the finite state-based strategy are implemented in the striatum.

View Article: PubMed Central - PubMed

Affiliation: Okinawa Institute of Science and Technology Graduate University, Onna-son Okinawa, Japan.

ABSTRACT
Previous theoretical studies of animal and human behavioral learning have focused on the dichotomy of the value-based strategy using action value functions to predict rewards and the model-based strategy using internal models to predict environmental states. However, animals and humans often take simple procedural behaviors, such as the "win-stay, lose-switch" strategy without explicit prediction of rewards or states. Here we consider another strategy, the finite state-based strategy, in which a subject selects an action depending on its discrete internal state and updates the state depending on the action chosen and the reward outcome. By analyzing choice behavior of rats in a free-choice task, we found that the finite state-based strategy fitted their behavioral choices more accurately than value-based and model-based strategies did. When fitted models were run autonomously with the same task, only the finite state-based strategy could reproduce the key feature of choice sequences. Analyses of neural activity recorded from the dorsolateral striatum (DLS), the dorsomedial striatum (DMS), and the ventral striatum (VS) identified significant fractions of neurons in all three subareas for which activities were correlated with individual states of the finite state-based strategy. The signal of internal states at the time of choice was found in DMS, and for clusters of states was found in VS. In addition, action values and state values of the value-based strategy were encoded in DMS and VS, respectively. These results suggest that both the value-based strategy and the finite state-based strategy are implemented in the striatum.

No MeSH data available.