Limits...
Model-based learning protects against forming habits.

Gillan CM, Otto AR, Phelps EA, Daw ND - Cogn Affect Behav Neurosci (2015)

Bottom Line: Studies in humans and rodents have suggested that behavior can at times be "goal-directed"-that is, planned, and purposeful-and at times "habitual"-that is, inflexible and automatically evoked by stimuli.We then tested for habits by devaluing one of the rewards that had reinforced behavior.In each case, we found that individual differences in model-based learning predicted the participants' subsequent sensitivity to outcome devaluation, suggesting that an associative mechanism underlies a bias toward habit formation in healthy individuals.

View Article: PubMed Central - PubMed

Affiliation: Department of Psychology, New York University, 6 Washington Place, New York, NY, 10003, USA, claire.gillan@gmail.com.

ABSTRACT
Studies in humans and rodents have suggested that behavior can at times be "goal-directed"-that is, planned, and purposeful-and at times "habitual"-that is, inflexible and automatically evoked by stimuli. This distinction is central to conceptions of pathological compulsion, as in drug abuse and obsessive-compulsive disorder. Evidence for the distinction has primarily come from outcome devaluation studies, in which the sensitivity of a previously learned behavior to motivational change is used to assay the dominance of habits versus goal-directed actions. However, little is known about how habits and goal-directed control arise. Specifically, in the present study we sought to reveal the trial-by-trial dynamics of instrumental learning that would promote, and protect against, developing habits. In two complementary experiments with independent samples, participants completed a sequential decision task that dissociated two computational-learning mechanisms, model-based and model-free. We then tested for habits by devaluing one of the rewards that had reinforced behavior. In each case, we found that individual differences in model-based learning predicted the participants' subsequent sensitivity to outcome devaluation, suggesting that an associative mechanism underlies a bias toward habit formation in healthy individuals.

No MeSH data available.


Related in: MedlinePlus

Experiment 1: Reinforcement-learning task. Participants entered one of two start states on each trial, which were associated with the receipt of gold and silver coins, each worth 25¢. Participants had 2.5 seconds (s) to make a choice, costing 1¢, which would commonly (70 %) lead them to a certain second state and rarely lead them to the alternative second state (30 %). No choices were made to the second state; each second state has a unique probability of reward that slowly changed over the course of the experiment. (B) Graph depicting a purely model-free learner, whose behavior is solely predicted by reinforcement history. (C) A purely model-based learner’s behavior, in contrast, is predicted by an interaction between reward and transition, such that behavior would mirror the model-free learner only when the transition from the initial choice to the outcome was common. Following rare transitions, a purely model-free learner would show the reverse pattern
© Copyright Policy - OpenAccess
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4526597&req=5

Fig1: Experiment 1: Reinforcement-learning task. Participants entered one of two start states on each trial, which were associated with the receipt of gold and silver coins, each worth 25¢. Participants had 2.5 seconds (s) to make a choice, costing 1¢, which would commonly (70 %) lead them to a certain second state and rarely lead them to the alternative second state (30 %). No choices were made to the second state; each second state has a unique probability of reward that slowly changed over the course of the experiment. (B) Graph depicting a purely model-free learner, whose behavior is solely predicted by reinforcement history. (C) A purely model-based learner’s behavior, in contrast, is predicted by an interaction between reward and transition, such that behavior would mirror the model-free learner only when the transition from the initial choice to the outcome was common. Following rare transitions, a purely model-free learner would show the reverse pattern

Mentions: On each trial, participants were presented with a choice between two fractals, each of which commonly (70 %; see Fig. 1A, white arrows) led to a particular second state displaying another fractal. These second-state fractals were termed “coin-boxes,” since they each had some probability (between .25 and .75) of being rewarded with a coin worth 25¢. On 30 % of trials (“rare” transition trials; Fig. 1A, gray arrows), choices uncharacteristically led to the alternative second state. A purely model-free learner would make choices irrespective of the transition structure of the task (i.e., whether a transition was rare or common), and would only show sensitivity to whether or not the last action had been rewarded (Fig. 1B). A model-based strategy, in contrast, is characterized by sensitivity to both prior reward and the transition structure of the task. For example, take the case in which a choice is followed by a rare transition to a second state, and that second state is rewarded. In this situation, a model-based learner would tend to switch choices on the next turn, because this would be more likely to return the learner to that second state (Fig. 1C). A model-free learner would make no such adjustment based on the transition type.Fig. 1


Model-based learning protects against forming habits.

Gillan CM, Otto AR, Phelps EA, Daw ND - Cogn Affect Behav Neurosci (2015)

Experiment 1: Reinforcement-learning task. Participants entered one of two start states on each trial, which were associated with the receipt of gold and silver coins, each worth 25¢. Participants had 2.5 seconds (s) to make a choice, costing 1¢, which would commonly (70 %) lead them to a certain second state and rarely lead them to the alternative second state (30 %). No choices were made to the second state; each second state has a unique probability of reward that slowly changed over the course of the experiment. (B) Graph depicting a purely model-free learner, whose behavior is solely predicted by reinforcement history. (C) A purely model-based learner’s behavior, in contrast, is predicted by an interaction between reward and transition, such that behavior would mirror the model-free learner only when the transition from the initial choice to the outcome was common. Following rare transitions, a purely model-free learner would show the reverse pattern
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4526597&req=5

Fig1: Experiment 1: Reinforcement-learning task. Participants entered one of two start states on each trial, which were associated with the receipt of gold and silver coins, each worth 25¢. Participants had 2.5 seconds (s) to make a choice, costing 1¢, which would commonly (70 %) lead them to a certain second state and rarely lead them to the alternative second state (30 %). No choices were made to the second state; each second state has a unique probability of reward that slowly changed over the course of the experiment. (B) Graph depicting a purely model-free learner, whose behavior is solely predicted by reinforcement history. (C) A purely model-based learner’s behavior, in contrast, is predicted by an interaction between reward and transition, such that behavior would mirror the model-free learner only when the transition from the initial choice to the outcome was common. Following rare transitions, a purely model-free learner would show the reverse pattern
Mentions: On each trial, participants were presented with a choice between two fractals, each of which commonly (70 %; see Fig. 1A, white arrows) led to a particular second state displaying another fractal. These second-state fractals were termed “coin-boxes,” since they each had some probability (between .25 and .75) of being rewarded with a coin worth 25¢. On 30 % of trials (“rare” transition trials; Fig. 1A, gray arrows), choices uncharacteristically led to the alternative second state. A purely model-free learner would make choices irrespective of the transition structure of the task (i.e., whether a transition was rare or common), and would only show sensitivity to whether or not the last action had been rewarded (Fig. 1B). A model-based strategy, in contrast, is characterized by sensitivity to both prior reward and the transition structure of the task. For example, take the case in which a choice is followed by a rare transition to a second state, and that second state is rewarded. In this situation, a model-based learner would tend to switch choices on the next turn, because this would be more likely to return the learner to that second state (Fig. 1C). A model-free learner would make no such adjustment based on the transition type.Fig. 1

Bottom Line: Studies in humans and rodents have suggested that behavior can at times be "goal-directed"-that is, planned, and purposeful-and at times "habitual"-that is, inflexible and automatically evoked by stimuli.We then tested for habits by devaluing one of the rewards that had reinforced behavior.In each case, we found that individual differences in model-based learning predicted the participants' subsequent sensitivity to outcome devaluation, suggesting that an associative mechanism underlies a bias toward habit formation in healthy individuals.

View Article: PubMed Central - PubMed

Affiliation: Department of Psychology, New York University, 6 Washington Place, New York, NY, 10003, USA, claire.gillan@gmail.com.

ABSTRACT
Studies in humans and rodents have suggested that behavior can at times be "goal-directed"-that is, planned, and purposeful-and at times "habitual"-that is, inflexible and automatically evoked by stimuli. This distinction is central to conceptions of pathological compulsion, as in drug abuse and obsessive-compulsive disorder. Evidence for the distinction has primarily come from outcome devaluation studies, in which the sensitivity of a previously learned behavior to motivational change is used to assay the dominance of habits versus goal-directed actions. However, little is known about how habits and goal-directed control arise. Specifically, in the present study we sought to reveal the trial-by-trial dynamics of instrumental learning that would promote, and protect against, developing habits. In two complementary experiments with independent samples, participants completed a sequential decision task that dissociated two computational-learning mechanisms, model-based and model-free. We then tested for habits by devaluing one of the rewards that had reinforced behavior. In each case, we found that individual differences in model-based learning predicted the participants' subsequent sensitivity to outcome devaluation, suggesting that an associative mechanism underlies a bias toward habit formation in healthy individuals.

No MeSH data available.


Related in: MedlinePlus