Limits...
Neural correlates of temporal credit assignment in the parietal lobe.

Gersch TM, Foley NC, Eisenberg I, Gottlieb J - PLoS ONE (2014)

Bottom Line: In two distinct contexts, the F step was either the first or the second in the sequence, controlling for effects of temporal discounting.We show that LIP neurons had the strongest value learning and strongest post-decision responses during the transition after the F step regardless of the serial position of this step.Thus, the neurons encode correlates of temporal credit assignment mechanisms that allocate learning to specific steps independently of temporal discounting.

View Article: PubMed Central - PubMed

Affiliation: Department of Neuroscience, Columbia University, New York, New York, United States of America.

ABSTRACT
Empirical studies of decision making have typically assumed that value learning is governed by time, such that a reward prediction error arising at a specific time triggers temporally-discounted learning for all preceding actions. However, in natural behavior, goals must be acquired through multiple actions, and each action can have different significance for the final outcome. As is recognized in computational research, carrying out multi-step actions requires the use of credit assignment mechanisms that focus learning on specific steps, but little is known about the neural correlates of these mechanisms. To investigate this question we recorded neurons in the monkey lateral intraparietal area (LIP) during a serial decision task where two consecutive eye movement decisions led to a final reward. The underlying decision trees were structured such that the two decisions had different relationships with the final reward, and the optimal strategy was to learn based on the final reward at one of the steps (the "F" step) but ignore changes in this reward at the remaining step (the "I" step). In two distinct contexts, the F step was either the first or the second in the sequence, controlling for effects of temporal discounting. We show that LIP neurons had the strongest value learning and strongest post-decision responses during the transition after the F step regardless of the serial position of this step. Thus, the neurons encode correlates of temporal credit assignment mechanisms that allocate learning to specific steps independently of temporal discounting.

Show MeSH
Behavior.The filled symbols show the fraction of choices of the optimal target at the first and second steps for the main task, and at the second step for the control task. Each point represents the average across sessions for a given trial number in a block.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3921206&req=5

pone-0088725-g002: Behavior.The filled symbols show the fraction of choices of the optimal target at the first and second steps for the main task, and at the second step for the control task. Each point represents the average across sessions for a given trial number in a block.

Mentions: The monkeys modulated their exploration in context-specific fashion, modifying their choices selectively at the F step but maintaining a stereotyped preference at the I step (Fig. 2). At the F step in both tasks, both monkeys started out a trial block with a bias toward the non-optimal target (which received the larger immediate reward) and slowly reversed this bias, reaching an asymptote of 75–80% optimal choices geared toward the large final reward. Learning was slow, consistent with a trial and error mechanism. A behavioral learn-point (defined as the last trial of 7 consecutive optimal choices) was reached, on average, after 41 trials in the main task and 50 trials in the control task (two-tailed t-test, p > 0.05 between tasks; monkey 2, learn points 25 (+/–5) for main vs 44 (+/–3) for control, p<0.05; monkey 1: learn points 55 (+/–5) for main, 54 (+/–4) for control, p > 0.05). This slow learning was not due to spurious factors such as incomplete training on the task (linear regression, p > 0.1 for effect of session number for each monkey and task) or an idiosyncratic bias for a spatial location (p > 0.05 for each monkey and task). Moreover, gradual learning was seen in each individual session with no evidence of a step-like switch to the optimal path, showing that it was not an artifact of averaging across sessions (Fig. S1). Therefore, even though the monkeys experienced only two alternative paths in each context, they seemed to re-discover the optimal one de novo by trial and error, consistent with previous reports [16], [17].


Neural correlates of temporal credit assignment in the parietal lobe.

Gersch TM, Foley NC, Eisenberg I, Gottlieb J - PLoS ONE (2014)

Behavior.The filled symbols show the fraction of choices of the optimal target at the first and second steps for the main task, and at the second step for the control task. Each point represents the average across sessions for a given trial number in a block.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3921206&req=5

pone-0088725-g002: Behavior.The filled symbols show the fraction of choices of the optimal target at the first and second steps for the main task, and at the second step for the control task. Each point represents the average across sessions for a given trial number in a block.
Mentions: The monkeys modulated their exploration in context-specific fashion, modifying their choices selectively at the F step but maintaining a stereotyped preference at the I step (Fig. 2). At the F step in both tasks, both monkeys started out a trial block with a bias toward the non-optimal target (which received the larger immediate reward) and slowly reversed this bias, reaching an asymptote of 75–80% optimal choices geared toward the large final reward. Learning was slow, consistent with a trial and error mechanism. A behavioral learn-point (defined as the last trial of 7 consecutive optimal choices) was reached, on average, after 41 trials in the main task and 50 trials in the control task (two-tailed t-test, p > 0.05 between tasks; monkey 2, learn points 25 (+/–5) for main vs 44 (+/–3) for control, p<0.05; monkey 1: learn points 55 (+/–5) for main, 54 (+/–4) for control, p > 0.05). This slow learning was not due to spurious factors such as incomplete training on the task (linear regression, p > 0.1 for effect of session number for each monkey and task) or an idiosyncratic bias for a spatial location (p > 0.05 for each monkey and task). Moreover, gradual learning was seen in each individual session with no evidence of a step-like switch to the optimal path, showing that it was not an artifact of averaging across sessions (Fig. S1). Therefore, even though the monkeys experienced only two alternative paths in each context, they seemed to re-discover the optimal one de novo by trial and error, consistent with previous reports [16], [17].

Bottom Line: In two distinct contexts, the F step was either the first or the second in the sequence, controlling for effects of temporal discounting.We show that LIP neurons had the strongest value learning and strongest post-decision responses during the transition after the F step regardless of the serial position of this step.Thus, the neurons encode correlates of temporal credit assignment mechanisms that allocate learning to specific steps independently of temporal discounting.

View Article: PubMed Central - PubMed

Affiliation: Department of Neuroscience, Columbia University, New York, New York, United States of America.

ABSTRACT
Empirical studies of decision making have typically assumed that value learning is governed by time, such that a reward prediction error arising at a specific time triggers temporally-discounted learning for all preceding actions. However, in natural behavior, goals must be acquired through multiple actions, and each action can have different significance for the final outcome. As is recognized in computational research, carrying out multi-step actions requires the use of credit assignment mechanisms that focus learning on specific steps, but little is known about the neural correlates of these mechanisms. To investigate this question we recorded neurons in the monkey lateral intraparietal area (LIP) during a serial decision task where two consecutive eye movement decisions led to a final reward. The underlying decision trees were structured such that the two decisions had different relationships with the final reward, and the optimal strategy was to learn based on the final reward at one of the steps (the "F" step) but ignore changes in this reward at the remaining step (the "I" step). In two distinct contexts, the F step was either the first or the second in the sequence, controlling for effects of temporal discounting. We show that LIP neurons had the strongest value learning and strongest post-decision responses during the transition after the F step regardless of the serial position of this step. Thus, the neurons encode correlates of temporal credit assignment mechanisms that allocate learning to specific steps independently of temporal discounting.

Show MeSH