Limits...
Model-based learning protects against forming habits.

Gillan CM, Otto AR, Phelps EA, Daw ND - Cogn Affect Behav Neurosci (2015)

Bottom Line: Studies in humans and rodents have suggested that behavior can at times be "goal-directed"-that is, planned, and purposeful-and at times "habitual"-that is, inflexible and automatically evoked by stimuli.We then tested for habits by devaluing one of the rewards that had reinforced behavior.In each case, we found that individual differences in model-based learning predicted the participants' subsequent sensitivity to outcome devaluation, suggesting that an associative mechanism underlies a bias toward habit formation in healthy individuals.

View Article: PubMed Central - PubMed

Affiliation: Department of Psychology, New York University, 6 Washington Place, New York, NY, 10003, USA, claire.gillan@gmail.com.

ABSTRACT
Studies in humans and rodents have suggested that behavior can at times be "goal-directed"-that is, planned, and purposeful-and at times "habitual"-that is, inflexible and automatically evoked by stimuli. This distinction is central to conceptions of pathological compulsion, as in drug abuse and obsessive-compulsive disorder. Evidence for the distinction has primarily come from outcome devaluation studies, in which the sensitivity of a previously learned behavior to motivational change is used to assay the dominance of habits versus goal-directed actions. However, little is known about how habits and goal-directed control arise. Specifically, in the present study we sought to reveal the trial-by-trial dynamics of instrumental learning that would promote, and protect against, developing habits. In two complementary experiments with independent samples, participants completed a sequential decision task that dissociated two computational-learning mechanisms, model-based and model-free. We then tested for habits by devaluing one of the rewards that had reinforced behavior. In each case, we found that individual differences in model-based learning predicted the participants' subsequent sensitivity to outcome devaluation, suggesting that an associative mechanism underlies a bias toward habit formation in healthy individuals.

No MeSH data available.


Related in: MedlinePlus

Experiment 2: Reinforcement-learning task. (A) Participants entered the same starting state on each trial and had 2.5 s to make a choice between two fractal stimuli that always appeared in this state. One fractal commonly (70 %) led to one of the second-stage states and rarely (30 %) led to the other. In contrast to Experiment 1, each second-stage state was uniquely associated with a certain type of coin (gold or silver). (B) For the first 150 trials, reward probabilities (the chance of winning a coin in a given second-stage state) drifted slowly over time according to Gaussian random walks. For the next 50 trials, the reward probabilities stabilized at .9 and .1, for the second-stage states associated with the to-be-devalued and to-remain-valued outcomes, respectively. This served to systematically bias all participants toward making the action that would later be devalued. Devaluation was randomized across coin colors and reward drifts
© Copyright Policy - OpenAccess
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4526597&req=5

Fig5: Experiment 2: Reinforcement-learning task. (A) Participants entered the same starting state on each trial and had 2.5 s to make a choice between two fractal stimuli that always appeared in this state. One fractal commonly (70 %) led to one of the second-stage states and rarely (30 %) led to the other. In contrast to Experiment 1, each second-stage state was uniquely associated with a certain type of coin (gold or silver). (B) For the first 150 trials, reward probabilities (the chance of winning a coin in a given second-stage state) drifted slowly over time according to Gaussian random walks. For the next 50 trials, the reward probabilities stabilized at .9 and .1, for the second-stage states associated with the to-be-devalued and to-remain-valued outcomes, respectively. This served to systematically bias all participants toward making the action that would later be devalued. Devaluation was randomized across coin colors and reward drifts

Mentions: In all, 39 males and 56 females were included in Experiment 2, and their ages ranged from 19 to 70 (M = 34.33 years, SD = 12.03). In this version of the task, we used a single MDP (instead of the two used in Exp. 1). The two second-stage states were associated with gold and silver coins, respectively (Fig. 5). There was no option to withhold responses, which did not come at a cost. The outcomes were devalued in exactly the same way as in Experiment 1, but prior to the outcome devaluation, we stabilized the reward probabilities at .9 and .1 for the second-stage states associated with the to-be-devalued and to-remain-valued coin colors, respectively. The second-stage state that produced the to-be-devalued coin type was always stabilized to .9, serving to bias participants to this choice prior to devaluation, so that we could examine subsequent habitual responses toward the devalued outcome against a high pre-devaluation baseline. Devaluation was randomized across coin colors and reward drifts. The main MDP task comprised 150 trials, which were followed by 50 trials of stable reward probabilities (only the former were used to assess trial-by-trial learning mechanisms). As in Experiment 1, we included four trials with no feedback prior to devaluing one of the outcomes, in order to decorrelate these occurrences. The habit test comprised ten trials with no feedback after participants had been alerted that one of their coin containers was full. To test for sensitivity to devaluation, we measured the proportion of valued responses selected in the test stage—that is, trials on which participants did not choose the devalued outcome. This measure of devaluation sensitivity was then entered into a mixed-effects logistic regression with the effects of reward, transition, and the intercept taken as random effects, and devaluation sensitivity as a fixed effect (as per the main model in Exp. 1). Ninety-nine participants were tested in Experiment 2, of which three were excluded because they missed more than 10 % of the total trials during the main MDP task, and one was excluded for having an implausibly fast reaction time (>2 SDs lower than the mean).Fig. 5


Model-based learning protects against forming habits.

Gillan CM, Otto AR, Phelps EA, Daw ND - Cogn Affect Behav Neurosci (2015)

Experiment 2: Reinforcement-learning task. (A) Participants entered the same starting state on each trial and had 2.5 s to make a choice between two fractal stimuli that always appeared in this state. One fractal commonly (70 %) led to one of the second-stage states and rarely (30 %) led to the other. In contrast to Experiment 1, each second-stage state was uniquely associated with a certain type of coin (gold or silver). (B) For the first 150 trials, reward probabilities (the chance of winning a coin in a given second-stage state) drifted slowly over time according to Gaussian random walks. For the next 50 trials, the reward probabilities stabilized at .9 and .1, for the second-stage states associated with the to-be-devalued and to-remain-valued outcomes, respectively. This served to systematically bias all participants toward making the action that would later be devalued. Devaluation was randomized across coin colors and reward drifts
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4526597&req=5

Fig5: Experiment 2: Reinforcement-learning task. (A) Participants entered the same starting state on each trial and had 2.5 s to make a choice between two fractal stimuli that always appeared in this state. One fractal commonly (70 %) led to one of the second-stage states and rarely (30 %) led to the other. In contrast to Experiment 1, each second-stage state was uniquely associated with a certain type of coin (gold or silver). (B) For the first 150 trials, reward probabilities (the chance of winning a coin in a given second-stage state) drifted slowly over time according to Gaussian random walks. For the next 50 trials, the reward probabilities stabilized at .9 and .1, for the second-stage states associated with the to-be-devalued and to-remain-valued outcomes, respectively. This served to systematically bias all participants toward making the action that would later be devalued. Devaluation was randomized across coin colors and reward drifts
Mentions: In all, 39 males and 56 females were included in Experiment 2, and their ages ranged from 19 to 70 (M = 34.33 years, SD = 12.03). In this version of the task, we used a single MDP (instead of the two used in Exp. 1). The two second-stage states were associated with gold and silver coins, respectively (Fig. 5). There was no option to withhold responses, which did not come at a cost. The outcomes were devalued in exactly the same way as in Experiment 1, but prior to the outcome devaluation, we stabilized the reward probabilities at .9 and .1 for the second-stage states associated with the to-be-devalued and to-remain-valued coin colors, respectively. The second-stage state that produced the to-be-devalued coin type was always stabilized to .9, serving to bias participants to this choice prior to devaluation, so that we could examine subsequent habitual responses toward the devalued outcome against a high pre-devaluation baseline. Devaluation was randomized across coin colors and reward drifts. The main MDP task comprised 150 trials, which were followed by 50 trials of stable reward probabilities (only the former were used to assess trial-by-trial learning mechanisms). As in Experiment 1, we included four trials with no feedback prior to devaluing one of the outcomes, in order to decorrelate these occurrences. The habit test comprised ten trials with no feedback after participants had been alerted that one of their coin containers was full. To test for sensitivity to devaluation, we measured the proportion of valued responses selected in the test stage—that is, trials on which participants did not choose the devalued outcome. This measure of devaluation sensitivity was then entered into a mixed-effects logistic regression with the effects of reward, transition, and the intercept taken as random effects, and devaluation sensitivity as a fixed effect (as per the main model in Exp. 1). Ninety-nine participants were tested in Experiment 2, of which three were excluded because they missed more than 10 % of the total trials during the main MDP task, and one was excluded for having an implausibly fast reaction time (>2 SDs lower than the mean).Fig. 5

Bottom Line: Studies in humans and rodents have suggested that behavior can at times be "goal-directed"-that is, planned, and purposeful-and at times "habitual"-that is, inflexible and automatically evoked by stimuli.We then tested for habits by devaluing one of the rewards that had reinforced behavior.In each case, we found that individual differences in model-based learning predicted the participants' subsequent sensitivity to outcome devaluation, suggesting that an associative mechanism underlies a bias toward habit formation in healthy individuals.

View Article: PubMed Central - PubMed

Affiliation: Department of Psychology, New York University, 6 Washington Place, New York, NY, 10003, USA, claire.gillan@gmail.com.

ABSTRACT
Studies in humans and rodents have suggested that behavior can at times be "goal-directed"-that is, planned, and purposeful-and at times "habitual"-that is, inflexible and automatically evoked by stimuli. This distinction is central to conceptions of pathological compulsion, as in drug abuse and obsessive-compulsive disorder. Evidence for the distinction has primarily come from outcome devaluation studies, in which the sensitivity of a previously learned behavior to motivational change is used to assay the dominance of habits versus goal-directed actions. However, little is known about how habits and goal-directed control arise. Specifically, in the present study we sought to reveal the trial-by-trial dynamics of instrumental learning that would promote, and protect against, developing habits. In two complementary experiments with independent samples, participants completed a sequential decision task that dissociated two computational-learning mechanisms, model-based and model-free. We then tested for habits by devaluing one of the rewards that had reinforced behavior. In each case, we found that individual differences in model-based learning predicted the participants' subsequent sensitivity to outcome devaluation, suggesting that an associative mechanism underlies a bias toward habit formation in healthy individuals.

No MeSH data available.


Related in: MedlinePlus