Limits...
Neural correlates of temporal credit assignment in the parietal lobe.

Gersch TM, Foley NC, Eisenberg I, Gottlieb J - PLoS ONE (2014)

Bottom Line: In two distinct contexts, the F step was either the first or the second in the sequence, controlling for effects of temporal discounting.We show that LIP neurons had the strongest value learning and strongest post-decision responses during the transition after the F step regardless of the serial position of this step.Thus, the neurons encode correlates of temporal credit assignment mechanisms that allocate learning to specific steps independently of temporal discounting.

View Article: PubMed Central - PubMed

Affiliation: Department of Neuroscience, Columbia University, New York, New York, United States of America.

ABSTRACT
Empirical studies of decision making have typically assumed that value learning is governed by time, such that a reward prediction error arising at a specific time triggers temporally-discounted learning for all preceding actions. However, in natural behavior, goals must be acquired through multiple actions, and each action can have different significance for the final outcome. As is recognized in computational research, carrying out multi-step actions requires the use of credit assignment mechanisms that focus learning on specific steps, but little is known about the neural correlates of these mechanisms. To investigate this question we recorded neurons in the monkey lateral intraparietal area (LIP) during a serial decision task where two consecutive eye movement decisions led to a final reward. The underlying decision trees were structured such that the two decisions had different relationships with the final reward, and the optimal strategy was to learn based on the final reward at one of the steps (the "F" step) but ignore changes in this reward at the remaining step (the "I" step). In two distinct contexts, the F step was either the first or the second in the sequence, controlling for effects of temporal discounting. We show that LIP neurons had the strongest value learning and strongest post-decision responses during the transition after the F step regardless of the serial position of this step. Thus, the neurons encode correlates of temporal credit assignment mechanisms that allocate learning to specific steps independently of temporal discounting.

Show MeSH

Related in: MedlinePlus

LIP neurons encode value independently of saccade direction.(a) Value and direction coefficients in the main task. The top and bottom panels show the time course of, respectively, the value and direction signals in the LIP response, aligned on target onset on the left, and saccade onset on the right. Traces show mean and SEM. The horizontal bars show paired comparisons of pre- and post-learning coefficients in the 200–300 ms after target onset (stars, p<0.05). (b) Value and direction coefficients in the control task. Same format as in (a).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3921206&req=5

pone-0088725-g003: LIP neurons encode value independently of saccade direction.(a) Value and direction coefficients in the main task. The top and bottom panels show the time course of, respectively, the value and direction signals in the LIP response, aligned on target onset on the left, and saccade onset on the right. Traces show mean and SEM. The horizontal bars show paired comparisons of pre- and post-learning coefficients in the 200–300 ms after target onset (stars, p<0.05). (b) Value and direction coefficients in the control task. Same format as in (a).

Mentions: LIP neurons showed significant positive coefficients for value and saccade direction in both the main and the control tasks; Fig. 3). The value coefficient peaked earlier during the decision period and was higher in trials that followed rather than preceding the learning point (200–300 ms, stars, p<0.05). In contrast, the direction coefficient peaked later during the pre-saccadic epoch and did not show significant learning effects at any time during the decision interval. The pattern found in the full data set was replicated individually in each monkey (Fig. S3) and, in individual cells, 59/96 neurons had significant value coefficients in at least one task (criterion of at least one time bin significant at p<0.001; 36/96 cells for the main task; 38/96 cells for the control task). We found no correlation between the value coefficient and the fraction of choices on a session-by-session basis for optimal (r  =  0.04) or non-optimal choices (r  =  0.12). This supports the prevailing view that the cells encode an intermediate stage of learning and valuation that influences, but is not rigidly mapped onto the final choice [3], [18].


Neural correlates of temporal credit assignment in the parietal lobe.

Gersch TM, Foley NC, Eisenberg I, Gottlieb J - PLoS ONE (2014)

LIP neurons encode value independently of saccade direction.(a) Value and direction coefficients in the main task. The top and bottom panels show the time course of, respectively, the value and direction signals in the LIP response, aligned on target onset on the left, and saccade onset on the right. Traces show mean and SEM. The horizontal bars show paired comparisons of pre- and post-learning coefficients in the 200–300 ms after target onset (stars, p<0.05). (b) Value and direction coefficients in the control task. Same format as in (a).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3921206&req=5

pone-0088725-g003: LIP neurons encode value independently of saccade direction.(a) Value and direction coefficients in the main task. The top and bottom panels show the time course of, respectively, the value and direction signals in the LIP response, aligned on target onset on the left, and saccade onset on the right. Traces show mean and SEM. The horizontal bars show paired comparisons of pre- and post-learning coefficients in the 200–300 ms after target onset (stars, p<0.05). (b) Value and direction coefficients in the control task. Same format as in (a).
Mentions: LIP neurons showed significant positive coefficients for value and saccade direction in both the main and the control tasks; Fig. 3). The value coefficient peaked earlier during the decision period and was higher in trials that followed rather than preceding the learning point (200–300 ms, stars, p<0.05). In contrast, the direction coefficient peaked later during the pre-saccadic epoch and did not show significant learning effects at any time during the decision interval. The pattern found in the full data set was replicated individually in each monkey (Fig. S3) and, in individual cells, 59/96 neurons had significant value coefficients in at least one task (criterion of at least one time bin significant at p<0.001; 36/96 cells for the main task; 38/96 cells for the control task). We found no correlation between the value coefficient and the fraction of choices on a session-by-session basis for optimal (r  =  0.04) or non-optimal choices (r  =  0.12). This supports the prevailing view that the cells encode an intermediate stage of learning and valuation that influences, but is not rigidly mapped onto the final choice [3], [18].

Bottom Line: In two distinct contexts, the F step was either the first or the second in the sequence, controlling for effects of temporal discounting.We show that LIP neurons had the strongest value learning and strongest post-decision responses during the transition after the F step regardless of the serial position of this step.Thus, the neurons encode correlates of temporal credit assignment mechanisms that allocate learning to specific steps independently of temporal discounting.

View Article: PubMed Central - PubMed

Affiliation: Department of Neuroscience, Columbia University, New York, New York, United States of America.

ABSTRACT
Empirical studies of decision making have typically assumed that value learning is governed by time, such that a reward prediction error arising at a specific time triggers temporally-discounted learning for all preceding actions. However, in natural behavior, goals must be acquired through multiple actions, and each action can have different significance for the final outcome. As is recognized in computational research, carrying out multi-step actions requires the use of credit assignment mechanisms that focus learning on specific steps, but little is known about the neural correlates of these mechanisms. To investigate this question we recorded neurons in the monkey lateral intraparietal area (LIP) during a serial decision task where two consecutive eye movement decisions led to a final reward. The underlying decision trees were structured such that the two decisions had different relationships with the final reward, and the optimal strategy was to learn based on the final reward at one of the steps (the "F" step) but ignore changes in this reward at the remaining step (the "I" step). In two distinct contexts, the F step was either the first or the second in the sequence, controlling for effects of temporal discounting. We show that LIP neurons had the strongest value learning and strongest post-decision responses during the transition after the F step regardless of the serial position of this step. Thus, the neurons encode correlates of temporal credit assignment mechanisms that allocate learning to specific steps independently of temporal discounting.

Show MeSH
Related in: MedlinePlus