Limits...
A cholinergic feedback circuit to regulate striatal population uncertainty and optimize reinforcement learning.

Franklin NT, Frank MJ - Elife (2015)

Bottom Line: We consider how cholinergic tonically active interneurons (TANs) may endow the striatum with such a mechanism in computational models spanning three Marr's levels of analysis.Long TAN pauses facilitated robustness to spurious outcomes by increasing divergence in synaptic weights between neurons coding for alternative action values, whereas short TAN pauses facilitated stochastic behavior but increased responsiveness to change-points in outcome contingencies.A feedback control system allowed TAN pauses to be dynamically modulated by uncertainty across the spiny neuron population, allowing the system to self-tune and optimize performance across stochastic environments.

View Article: PubMed Central - PubMed

Affiliation: Department of Cognitive, Linguistic and Psychological Sciences, Brown Institute for Brain Science, Brown University, Providence, United States.

ABSTRACT
Convergent evidence suggests that the basal ganglia support reinforcement learning by adjusting action values according to reward prediction errors. However, adaptive behavior in stochastic environments requires the consideration of uncertainty to dynamically adjust the learning rate. We consider how cholinergic tonically active interneurons (TANs) may endow the striatum with such a mechanism in computational models spanning three Marr's levels of analysis. In the neural model, TANs modulate the excitability of spiny neurons, their population response to reinforcement, and hence the effective learning rate. Long TAN pauses facilitated robustness to spurious outcomes by increasing divergence in synaptic weights between neurons coding for alternative action values, whereas short TAN pauses facilitated stochastic behavior but increased responsiveness to change-points in outcome contingencies. A feedback control system allowed TAN pauses to be dynamically modulated by uncertainty across the spiny neuron population, allowing the system to self-tune and optimize performance across stochastic environments.

No MeSH data available.


Related in: MedlinePlus

Network behavior as a function of TAN pause duration.(A, B): Accuracy of the neural network simulations over a range of fixed pause durations (120–280 ms) during initial acquisition (A) and following reversal (B). Networks with short TAN pauses (red) learned more quickly following reversal than networks with a long TAN pauses (blue) but did not achieve the same level of asymptotic accuracy during either acquisition or following reversal. (C) Final MSN entropy after training for both acquisition and reversal, as a function of TAN pause duration. Longer pauses elicited lower entropic MSN representations, facilitating asymptotic performance, whereas the higher entropy in short pause networks facilitated faster reversal. (D) Action selection uncertainty across MSN population across all trials for both stimuli. Entropy in networks with a long TAN pause (black) declines over time and reaches asymptotic level both prior to and following reversal. (E) Mean difference in corticostriatal Go weights (synaptic efficacy) coding for the more rewarded versus less rewarded response (defined during acquisition) are shown across training trials for a single stimulus, in an 85:15% reward environment. Greater accumulation in weight differences result in more deterministic choice but difficulty in unlearning. Data shown for networks trained on with a long (270 ms) or short (150 ms) fixed duration TAN pause. Reversal at trial 200 denoted with dotted line. MSN, medium spiny neurons; TANs, tonically active neurons.DOI:http://dx.doi.org/10.7554/eLife.12029.006
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4764588&req=5

fig4: Network behavior as a function of TAN pause duration.(A, B): Accuracy of the neural network simulations over a range of fixed pause durations (120–280 ms) during initial acquisition (A) and following reversal (B). Networks with short TAN pauses (red) learned more quickly following reversal than networks with a long TAN pauses (blue) but did not achieve the same level of asymptotic accuracy during either acquisition or following reversal. (C) Final MSN entropy after training for both acquisition and reversal, as a function of TAN pause duration. Longer pauses elicited lower entropic MSN representations, facilitating asymptotic performance, whereas the higher entropy in short pause networks facilitated faster reversal. (D) Action selection uncertainty across MSN population across all trials for both stimuli. Entropy in networks with a long TAN pause (black) declines over time and reaches asymptotic level both prior to and following reversal. (E) Mean difference in corticostriatal Go weights (synaptic efficacy) coding for the more rewarded versus less rewarded response (defined during acquisition) are shown across training trials for a single stimulus, in an 85:15% reward environment. Greater accumulation in weight differences result in more deterministic choice but difficulty in unlearning. Data shown for networks trained on with a long (270 ms) or short (150 ms) fixed duration TAN pause. Reversal at trial 200 denoted with dotted line. MSN, medium spiny neurons; TANs, tonically active neurons.DOI:http://dx.doi.org/10.7554/eLife.12029.006

Mentions: To directly study the impact of TANs on learning, we first parametrically varied the duration of the TAN pause exogenously (below we explore how this pause may be endogenously self-regulated by the network). Varying pause duration across networks reveals a behavioral trade-off between learning speed following probabilistic reversal and asymptotic accuracy (Figure 4, top row). During acquisition, learning speed is similar across a range of TAN pause durations, but networks with long pauses exhibit higher levels of asymptotic accuracy. Long pauses facilitate a disproportionate increase in synaptic efficacy for the most frequently rewarded response despite the occasional spurious outcome (Figure 5, bottom right), driving MSNs to focus their activity to a single dominant action (as quantified by MSN entropy; Figure 4, bottom). However, this same property makes such networks slow to reverse. Conversely, networks with short TAN pause durations learn more quickly following reversal but at a cost of asymptotic accuracy. Mechanistically, this occurs as short TAN pauses elicit weaker activation of spiny units, and hence elicit less activity-dependent plasticity, leading to a smaller learned divergence in synaptic weights between alternative responses (Figure 5, bottom right). This stunted divergence allows for continued co-activation of multiple response alternatives during subsequent action selection, and hence more stochastic choice, but this same property supports the network’s ability to quickly reinforce alternative actions following change points. Thus, although in these simulations the TAN pause modulates excitability only during window of the phasic dopaminergic reinforcement signal, it influences which MSNs learn and hence the population activity and action selection entropy thereof during subsequent choices.10.7554/eLife.12029.006Figure 4.Network behavior as a function of TAN pause duration.


A cholinergic feedback circuit to regulate striatal population uncertainty and optimize reinforcement learning.

Franklin NT, Frank MJ - Elife (2015)

Network behavior as a function of TAN pause duration.(A, B): Accuracy of the neural network simulations over a range of fixed pause durations (120–280 ms) during initial acquisition (A) and following reversal (B). Networks with short TAN pauses (red) learned more quickly following reversal than networks with a long TAN pauses (blue) but did not achieve the same level of asymptotic accuracy during either acquisition or following reversal. (C) Final MSN entropy after training for both acquisition and reversal, as a function of TAN pause duration. Longer pauses elicited lower entropic MSN representations, facilitating asymptotic performance, whereas the higher entropy in short pause networks facilitated faster reversal. (D) Action selection uncertainty across MSN population across all trials for both stimuli. Entropy in networks with a long TAN pause (black) declines over time and reaches asymptotic level both prior to and following reversal. (E) Mean difference in corticostriatal Go weights (synaptic efficacy) coding for the more rewarded versus less rewarded response (defined during acquisition) are shown across training trials for a single stimulus, in an 85:15% reward environment. Greater accumulation in weight differences result in more deterministic choice but difficulty in unlearning. Data shown for networks trained on with a long (270 ms) or short (150 ms) fixed duration TAN pause. Reversal at trial 200 denoted with dotted line. MSN, medium spiny neurons; TANs, tonically active neurons.DOI:http://dx.doi.org/10.7554/eLife.12029.006
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4764588&req=5

fig4: Network behavior as a function of TAN pause duration.(A, B): Accuracy of the neural network simulations over a range of fixed pause durations (120–280 ms) during initial acquisition (A) and following reversal (B). Networks with short TAN pauses (red) learned more quickly following reversal than networks with a long TAN pauses (blue) but did not achieve the same level of asymptotic accuracy during either acquisition or following reversal. (C) Final MSN entropy after training for both acquisition and reversal, as a function of TAN pause duration. Longer pauses elicited lower entropic MSN representations, facilitating asymptotic performance, whereas the higher entropy in short pause networks facilitated faster reversal. (D) Action selection uncertainty across MSN population across all trials for both stimuli. Entropy in networks with a long TAN pause (black) declines over time and reaches asymptotic level both prior to and following reversal. (E) Mean difference in corticostriatal Go weights (synaptic efficacy) coding for the more rewarded versus less rewarded response (defined during acquisition) are shown across training trials for a single stimulus, in an 85:15% reward environment. Greater accumulation in weight differences result in more deterministic choice but difficulty in unlearning. Data shown for networks trained on with a long (270 ms) or short (150 ms) fixed duration TAN pause. Reversal at trial 200 denoted with dotted line. MSN, medium spiny neurons; TANs, tonically active neurons.DOI:http://dx.doi.org/10.7554/eLife.12029.006
Mentions: To directly study the impact of TANs on learning, we first parametrically varied the duration of the TAN pause exogenously (below we explore how this pause may be endogenously self-regulated by the network). Varying pause duration across networks reveals a behavioral trade-off between learning speed following probabilistic reversal and asymptotic accuracy (Figure 4, top row). During acquisition, learning speed is similar across a range of TAN pause durations, but networks with long pauses exhibit higher levels of asymptotic accuracy. Long pauses facilitate a disproportionate increase in synaptic efficacy for the most frequently rewarded response despite the occasional spurious outcome (Figure 5, bottom right), driving MSNs to focus their activity to a single dominant action (as quantified by MSN entropy; Figure 4, bottom). However, this same property makes such networks slow to reverse. Conversely, networks with short TAN pause durations learn more quickly following reversal but at a cost of asymptotic accuracy. Mechanistically, this occurs as short TAN pauses elicit weaker activation of spiny units, and hence elicit less activity-dependent plasticity, leading to a smaller learned divergence in synaptic weights between alternative responses (Figure 5, bottom right). This stunted divergence allows for continued co-activation of multiple response alternatives during subsequent action selection, and hence more stochastic choice, but this same property supports the network’s ability to quickly reinforce alternative actions following change points. Thus, although in these simulations the TAN pause modulates excitability only during window of the phasic dopaminergic reinforcement signal, it influences which MSNs learn and hence the population activity and action selection entropy thereof during subsequent choices.10.7554/eLife.12029.006Figure 4.Network behavior as a function of TAN pause duration.

Bottom Line: We consider how cholinergic tonically active interneurons (TANs) may endow the striatum with such a mechanism in computational models spanning three Marr's levels of analysis.Long TAN pauses facilitated robustness to spurious outcomes by increasing divergence in synaptic weights between neurons coding for alternative action values, whereas short TAN pauses facilitated stochastic behavior but increased responsiveness to change-points in outcome contingencies.A feedback control system allowed TAN pauses to be dynamically modulated by uncertainty across the spiny neuron population, allowing the system to self-tune and optimize performance across stochastic environments.

View Article: PubMed Central - PubMed

Affiliation: Department of Cognitive, Linguistic and Psychological Sciences, Brown Institute for Brain Science, Brown University, Providence, United States.

ABSTRACT
Convergent evidence suggests that the basal ganglia support reinforcement learning by adjusting action values according to reward prediction errors. However, adaptive behavior in stochastic environments requires the consideration of uncertainty to dynamically adjust the learning rate. We consider how cholinergic tonically active interneurons (TANs) may endow the striatum with such a mechanism in computational models spanning three Marr's levels of analysis. In the neural model, TANs modulate the excitability of spiny neurons, their population response to reinforcement, and hence the effective learning rate. Long TAN pauses facilitated robustness to spurious outcomes by increasing divergence in synaptic weights between neurons coding for alternative action values, whereas short TAN pauses facilitated stochastic behavior but increased responsiveness to change-points in outcome contingencies. A feedback control system allowed TAN pauses to be dynamically modulated by uncertainty across the spiny neuron population, allowing the system to self-tune and optimize performance across stochastic environments.

No MeSH data available.


Related in: MedlinePlus