Limits...
Processing speed enhances model-based over model-free reinforcement learning in the presence of high working memory functioning.

Schad DJ, Jünger E, Sebold M, Garbusow M, Bernhardt N, Javadi AH, Zimmermann US, Smolka MN, Heinz A, Rapp MA, Huys QJ - Front Psychol (2014)

Bottom Line: Though both have been shown to control choices, the cognitive abilities associated with these systems are under ongoing investigation.Here we examine the link to cognitive abilities, and find that individual differences in processing speed covary with a shift from model-free to model-based choice control in the presence of above-average working memory function.Furthermore, it provides a rationale for individual differences in the tendency to deploy valuation systems, which may be important for understanding the manifold neuropsychiatric diseases associated with malfunctions of valuation.

View Article: PubMed Central - PubMed

Affiliation: Department of Psychiatry and Psychotherapy, Charité Universitätsmedizin Berlin Berlin, Germany.

ABSTRACT
Theories of decision-making and its neural substrates have long assumed the existence of two distinct and competing valuation systems, variously described as goal-directed vs. habitual, or, more recently and based on statistical arguments, as model-free vs. model-based reinforcement-learning. Though both have been shown to control choices, the cognitive abilities associated with these systems are under ongoing investigation. Here we examine the link to cognitive abilities, and find that individual differences in processing speed covary with a shift from model-free to model-based choice control in the presence of above-average working memory function. This suggests shared cognitive and neural processes; provides a bridge between literatures on intelligence and valuation; and may guide the development of process models of different valuation components. Furthermore, it provides a rationale for individual differences in the tendency to deploy valuation systems, which may be important for understanding the manifold neuropsychiatric diseases associated with malfunctions of valuation.

No MeSH data available.


(A) Trial structure: Step 1 consisted of a choice between two abstract gray stimuli. The unchosen stimulus faded away while the chosen stimulus was highlighted with a red frame and moved to the top of the screen, where it remained visible for 1.5 s. In Step 2 a second, colored, stimulus pair appeared. Step 2 choices resulted either in a win of 20 Cents or no win. (B) Transition structure: Each first stage stimulus led to one, fixed, second stage pair in 70% of the trials (common transition), and to the other second stage stimulus pair in 30% of the trials (rare transition). Reinforcement probabilities for each second stage stimulus changed slowly and independently between 25% and 75% according to Gaussian random walks with reflecting boundaries (Daw et al., 2011). Win probabilities, P (reward), are displayed as a function of trial number. (C) Model predictions: Predictions from the computational model (Daw et al., 2011) based on the model-free (left panel) vs. model-based (right panel) system for the probability to repeat the choice from the previous trial as a function of reward (rew., rewarded; unrew., unrewarded) and transition type at the previous trial. Model-free choice predicts a main effect of reward, and no effect of transition. Model-based choice predicts an interaction of transition × reward. Figure partly adapted from Sebold et al. (2014).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4269125&req=5

Figure 1: (A) Trial structure: Step 1 consisted of a choice between two abstract gray stimuli. The unchosen stimulus faded away while the chosen stimulus was highlighted with a red frame and moved to the top of the screen, where it remained visible for 1.5 s. In Step 2 a second, colored, stimulus pair appeared. Step 2 choices resulted either in a win of 20 Cents or no win. (B) Transition structure: Each first stage stimulus led to one, fixed, second stage pair in 70% of the trials (common transition), and to the other second stage stimulus pair in 30% of the trials (rare transition). Reinforcement probabilities for each second stage stimulus changed slowly and independently between 25% and 75% according to Gaussian random walks with reflecting boundaries (Daw et al., 2011). Win probabilities, P (reward), are displayed as a function of trial number. (C) Model predictions: Predictions from the computational model (Daw et al., 2011) based on the model-free (left panel) vs. model-based (right panel) system for the probability to repeat the choice from the previous trial as a function of reward (rew., rewarded; unrew., unrewarded) and transition type at the previous trial. Model-free choice predicts a main effect of reward, and no effect of transition. Model-based choice predicts an interaction of transition × reward. Figure partly adapted from Sebold et al. (2014).

Mentions: The two-step decision task (Daw et al., 2011; see Figure 1) was re-programmed in MATLAB, using the Psychophysics Toolbox extensions and a different set of colored stimuli. Importantly, the same sequence of outcome probabilities as used in the original publication was used. The task required subjects to choose one of two stimuli (step 1) immediately followed by another stimulus pair at step 2 (see Figure 1A). Participants were instructed to maximize their rewards. Crucially, the probability of reward at step 2 changed over time according to an independent random walk for each of the four step 2 stimuli (Figure 1B). The probabilities of being presented with a given set of stimuli at step 2 were determined by the choice at step 1 and did not change over time; there was a common (70%) and a rare (30%) transition. To enhance participants' motivation one third of all rewards with a fixed minimum of 3 and a maximum of 10 Euros were additionally paid out at the end of the experiment. Participants were given very detailed information about the structure of the task; they were informed about the varying outcome probabilities at step 2 (including being shown sample random walks) and about the constant transition probabilities between step 1 and 2. Subjects underwent 50 practice trials prior to performing the task proper.


Processing speed enhances model-based over model-free reinforcement learning in the presence of high working memory functioning.

Schad DJ, Jünger E, Sebold M, Garbusow M, Bernhardt N, Javadi AH, Zimmermann US, Smolka MN, Heinz A, Rapp MA, Huys QJ - Front Psychol (2014)

(A) Trial structure: Step 1 consisted of a choice between two abstract gray stimuli. The unchosen stimulus faded away while the chosen stimulus was highlighted with a red frame and moved to the top of the screen, where it remained visible for 1.5 s. In Step 2 a second, colored, stimulus pair appeared. Step 2 choices resulted either in a win of 20 Cents or no win. (B) Transition structure: Each first stage stimulus led to one, fixed, second stage pair in 70% of the trials (common transition), and to the other second stage stimulus pair in 30% of the trials (rare transition). Reinforcement probabilities for each second stage stimulus changed slowly and independently between 25% and 75% according to Gaussian random walks with reflecting boundaries (Daw et al., 2011). Win probabilities, P (reward), are displayed as a function of trial number. (C) Model predictions: Predictions from the computational model (Daw et al., 2011) based on the model-free (left panel) vs. model-based (right panel) system for the probability to repeat the choice from the previous trial as a function of reward (rew., rewarded; unrew., unrewarded) and transition type at the previous trial. Model-free choice predicts a main effect of reward, and no effect of transition. Model-based choice predicts an interaction of transition × reward. Figure partly adapted from Sebold et al. (2014).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4269125&req=5

Figure 1: (A) Trial structure: Step 1 consisted of a choice between two abstract gray stimuli. The unchosen stimulus faded away while the chosen stimulus was highlighted with a red frame and moved to the top of the screen, where it remained visible for 1.5 s. In Step 2 a second, colored, stimulus pair appeared. Step 2 choices resulted either in a win of 20 Cents or no win. (B) Transition structure: Each first stage stimulus led to one, fixed, second stage pair in 70% of the trials (common transition), and to the other second stage stimulus pair in 30% of the trials (rare transition). Reinforcement probabilities for each second stage stimulus changed slowly and independently between 25% and 75% according to Gaussian random walks with reflecting boundaries (Daw et al., 2011). Win probabilities, P (reward), are displayed as a function of trial number. (C) Model predictions: Predictions from the computational model (Daw et al., 2011) based on the model-free (left panel) vs. model-based (right panel) system for the probability to repeat the choice from the previous trial as a function of reward (rew., rewarded; unrew., unrewarded) and transition type at the previous trial. Model-free choice predicts a main effect of reward, and no effect of transition. Model-based choice predicts an interaction of transition × reward. Figure partly adapted from Sebold et al. (2014).
Mentions: The two-step decision task (Daw et al., 2011; see Figure 1) was re-programmed in MATLAB, using the Psychophysics Toolbox extensions and a different set of colored stimuli. Importantly, the same sequence of outcome probabilities as used in the original publication was used. The task required subjects to choose one of two stimuli (step 1) immediately followed by another stimulus pair at step 2 (see Figure 1A). Participants were instructed to maximize their rewards. Crucially, the probability of reward at step 2 changed over time according to an independent random walk for each of the four step 2 stimuli (Figure 1B). The probabilities of being presented with a given set of stimuli at step 2 were determined by the choice at step 1 and did not change over time; there was a common (70%) and a rare (30%) transition. To enhance participants' motivation one third of all rewards with a fixed minimum of 3 and a maximum of 10 Euros were additionally paid out at the end of the experiment. Participants were given very detailed information about the structure of the task; they were informed about the varying outcome probabilities at step 2 (including being shown sample random walks) and about the constant transition probabilities between step 1 and 2. Subjects underwent 50 practice trials prior to performing the task proper.

Bottom Line: Though both have been shown to control choices, the cognitive abilities associated with these systems are under ongoing investigation.Here we examine the link to cognitive abilities, and find that individual differences in processing speed covary with a shift from model-free to model-based choice control in the presence of above-average working memory function.Furthermore, it provides a rationale for individual differences in the tendency to deploy valuation systems, which may be important for understanding the manifold neuropsychiatric diseases associated with malfunctions of valuation.

View Article: PubMed Central - PubMed

Affiliation: Department of Psychiatry and Psychotherapy, Charité Universitätsmedizin Berlin Berlin, Germany.

ABSTRACT
Theories of decision-making and its neural substrates have long assumed the existence of two distinct and competing valuation systems, variously described as goal-directed vs. habitual, or, more recently and based on statistical arguments, as model-free vs. model-based reinforcement-learning. Though both have been shown to control choices, the cognitive abilities associated with these systems are under ongoing investigation. Here we examine the link to cognitive abilities, and find that individual differences in processing speed covary with a shift from model-free to model-based choice control in the presence of above-average working memory function. This suggests shared cognitive and neural processes; provides a bridge between literatures on intelligence and valuation; and may guide the development of process models of different valuation components. Furthermore, it provides a rationale for individual differences in the tendency to deploy valuation systems, which may be important for understanding the manifold neuropsychiatric diseases associated with malfunctions of valuation.

No MeSH data available.