Limits...
The Computational Development of Reinforcement Learning during Adolescence.

Palminteri S, Kilford EJ, Coricelli G, Blakemore SJ - PLoS Comput. Biol. (2016)

Bottom Line: Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissociable computational modules.Unlike adults, adolescent performance did not benefit from counterfactual (complete) feedback.This tendency to rely on rewards and not to consider alternative consequences of actions might contribute to our understanding of decision-making in adolescence.

View Article: PubMed Central - PubMed

Affiliation: Institute of Cognitive Neuroscience, University College London, London, United Kingdom.

ABSTRACT
Adolescence is a period of life characterised by changes in learning and decision-making. Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissociable computational modules. Here, we aimed to trace the developmental time-course of the computational modules responsible for learning from reward or punishment, and learning from counterfactual feedback. Adolescents and adults carried out a novel reinforcement learning paradigm in which participants learned the association between cues and probabilistic outcomes, where the outcomes differed in valence (reward versus punishment) and feedback was either partial or complete (either the outcome of the chosen option only, or the outcomes of both the chosen and unchosen option, were displayed). Computational strategies changed during development: whereas adolescents' behaviour was better explained by a basic reinforcement learning algorithm, adults' behaviour integrated increasingly complex computational features, namely a counterfactual learning module (enabling enhanced performance in the presence of complete feedback) and a value contextualisation module (enabling symmetrical reward and punishment learning). Unlike adults, adolescent performance did not benefit from counterfactual (complete) feedback. In addition, while adults learned symmetrically from both reward and punishment, adolescents learned from reward but were less likely to learn from punishment. This tendency to rely on rewards and not to consider alternative consequences of actions might contribute to our understanding of decision-making in adolescence.

No MeSH data available.


Baseline model fitting and model comparison.(A) Choice inverse temperature (β) of each model for adults (dark grey) and adolescents (light grey). (B). Posterior probability (PP) of each model for adults (dark grey) and adolescents (light grey). The dotted line indicates chance level (0.33). #P<0.05; 2-sided, one-sample, t-tests; *P<0.001; 2-sided, independent samples, t-tests. Error bars represent s.e.m.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4920542&req=5

pcbi.1004953.g003: Baseline model fitting and model comparison.(A) Choice inverse temperature (β) of each model for adults (dark grey) and adolescents (light grey). (B). Posterior probability (PP) of each model for adults (dark grey) and adolescents (light grey). The dotted line indicates chance level (0.33). #P<0.05; 2-sided, one-sample, t-tests; *P<0.001; 2-sided, independent samples, t-tests. Error bars represent s.e.m.

Mentions: We fitted the three models to individual histories of choices and outcomes, in order to obtain, for each participant and each model, the parameters that maximised the negative log-likelihood of participants’ choices during the learning task (see S1 Table). To assess whether baseline model fitting differed between adolescents and adults, we submitted the negative log-likelihood and the inverse temperature parameter (β) to mixed-design ANOVA with group (Adolescents vs. Adults) as the between-subjects factor and model as the within-subjects factor. For negative log-likelihood (a measure of model quality of fit), there was no main effect of group (F(1,36) = 1.3, P>0.2) and the group x model interaction did not reach significance (F(2,72) = 2.7, P<0.08). Note that the main effect of model cannot be tested since the models are nested and therefore the negative log-likelihood can only decrease. Analysis of the inverse temperature (β) parameter supported these results. This parameter can be taken as a measure of how well choices are predicted by the model and strongly correlates with the model likelihood (for all models: R>0.93; P<0.001). There was no main effect of group (F(1,36) = 2.3, P>0.1) but there was a significant group x model interaction (F(2,72) = 5.0, P<0.01) (Fig 3A). Post-hoc comparisons showed that this interaction was driven by adults showing increases in inverse temperature when comparing Model 1 to Model 2 (T(19) = 3.2, P<0.01) and Model 2 to Model 3 (T(19) = 2.2, P<0.05). Baseline (Model 1) inverse temperature did not differ between adults and adolescents (T(36) = 0.4, P>0.70). The absence of main effects of group indicates that baseline quality of fit was not different between age groups, thus allowing further model comparison analyses.


The Computational Development of Reinforcement Learning during Adolescence.

Palminteri S, Kilford EJ, Coricelli G, Blakemore SJ - PLoS Comput. Biol. (2016)

Baseline model fitting and model comparison.(A) Choice inverse temperature (β) of each model for adults (dark grey) and adolescents (light grey). (B). Posterior probability (PP) of each model for adults (dark grey) and adolescents (light grey). The dotted line indicates chance level (0.33). #P<0.05; 2-sided, one-sample, t-tests; *P<0.001; 2-sided, independent samples, t-tests. Error bars represent s.e.m.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4920542&req=5

pcbi.1004953.g003: Baseline model fitting and model comparison.(A) Choice inverse temperature (β) of each model for adults (dark grey) and adolescents (light grey). (B). Posterior probability (PP) of each model for adults (dark grey) and adolescents (light grey). The dotted line indicates chance level (0.33). #P<0.05; 2-sided, one-sample, t-tests; *P<0.001; 2-sided, independent samples, t-tests. Error bars represent s.e.m.
Mentions: We fitted the three models to individual histories of choices and outcomes, in order to obtain, for each participant and each model, the parameters that maximised the negative log-likelihood of participants’ choices during the learning task (see S1 Table). To assess whether baseline model fitting differed between adolescents and adults, we submitted the negative log-likelihood and the inverse temperature parameter (β) to mixed-design ANOVA with group (Adolescents vs. Adults) as the between-subjects factor and model as the within-subjects factor. For negative log-likelihood (a measure of model quality of fit), there was no main effect of group (F(1,36) = 1.3, P>0.2) and the group x model interaction did not reach significance (F(2,72) = 2.7, P<0.08). Note that the main effect of model cannot be tested since the models are nested and therefore the negative log-likelihood can only decrease. Analysis of the inverse temperature (β) parameter supported these results. This parameter can be taken as a measure of how well choices are predicted by the model and strongly correlates with the model likelihood (for all models: R>0.93; P<0.001). There was no main effect of group (F(1,36) = 2.3, P>0.1) but there was a significant group x model interaction (F(2,72) = 5.0, P<0.01) (Fig 3A). Post-hoc comparisons showed that this interaction was driven by adults showing increases in inverse temperature when comparing Model 1 to Model 2 (T(19) = 3.2, P<0.01) and Model 2 to Model 3 (T(19) = 2.2, P<0.05). Baseline (Model 1) inverse temperature did not differ between adults and adolescents (T(36) = 0.4, P>0.70). The absence of main effects of group indicates that baseline quality of fit was not different between age groups, thus allowing further model comparison analyses.

Bottom Line: Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissociable computational modules.Unlike adults, adolescent performance did not benefit from counterfactual (complete) feedback.This tendency to rely on rewards and not to consider alternative consequences of actions might contribute to our understanding of decision-making in adolescence.

View Article: PubMed Central - PubMed

Affiliation: Institute of Cognitive Neuroscience, University College London, London, United Kingdom.

ABSTRACT
Adolescence is a period of life characterised by changes in learning and decision-making. Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissociable computational modules. Here, we aimed to trace the developmental time-course of the computational modules responsible for learning from reward or punishment, and learning from counterfactual feedback. Adolescents and adults carried out a novel reinforcement learning paradigm in which participants learned the association between cues and probabilistic outcomes, where the outcomes differed in valence (reward versus punishment) and feedback was either partial or complete (either the outcome of the chosen option only, or the outcomes of both the chosen and unchosen option, were displayed). Computational strategies changed during development: whereas adolescents' behaviour was better explained by a basic reinforcement learning algorithm, adults' behaviour integrated increasingly complex computational features, namely a counterfactual learning module (enabling enhanced performance in the presence of complete feedback) and a value contextualisation module (enabling symmetrical reward and punishment learning). Unlike adults, adolescent performance did not benefit from counterfactual (complete) feedback. In addition, while adults learned symmetrically from both reward and punishment, adolescents learned from reward but were less likely to learn from punishment. This tendency to rely on rewards and not to consider alternative consequences of actions might contribute to our understanding of decision-making in adolescence.

No MeSH data available.