Limits...
Tonic dopamine modulates exploitation of reward learning.

Beeler JA, Daw N, Frazier CR, Zhuang X - Front Behav Neurosci (2010)

Bottom Line: In this "closed economy" paradigm, subjects earn all of their food by pressing either of two levers, but the relative cost for food on each lever shifts frequently.We fit the lever choice data using reinforcement learning models to assess the distinction between acquisition and expression the models formalize.These data suggest that dopamine modulates the degree to which prior learning biases action selection and consequently alters the expression of learned, motivated behavior.

View Article: PubMed Central - PubMed

Affiliation: Department of Neurobiology, University of Chicago, Chicago, IL, USA. jabeeler@uchicago.edu

ABSTRACT
The impact of dopamine on adaptive behavior in a naturalistic environment is largely unexamined. Experimental work suggests that phasic dopamine is central to reinforcement learning whereas tonic dopamine may modulate performance without altering learning per se; however, this idea has not been developed formally or integrated with computational models of dopamine function. We quantitatively evaluate the role of tonic dopamine in these functions by studying the behavior of hyperdopaminergic DAT knockdown mice in an instrumental task in a semi-naturalistic homecage environment. In this "closed economy" paradigm, subjects earn all of their food by pressing either of two levers, but the relative cost for food on each lever shifts frequently. Compared to wild-type mice, hyperdopaminergic mice allocate more lever presses on high-cost levers, thus working harder to earn a given amount of food and maintain their body weight. However, both groups show a similarly quick reaction to shifts in lever cost, suggesting that the hyperdominergic mice are not slower at detecting changes, as with a learning deficit. We fit the lever choice data using reinforcement learning models to assess the distinction between acquisition and expression the models formalize. In these analyses, hyperdopaminergic mice displayed normal learning from recent reward history but diminished capacity to exploit this learning: a reduced coupling between choice and reward history. These data suggest that dopamine modulates the degree to which prior learning biases action selection and consequently alters the expression of learned, motivated behavior.

No MeSH data available.


Mean allocation of effort and runlength on the high and low cost lever following the switch in reward contingency (dashed line). Mean lever presses per minute 10 min before and after reward contingency switch for (A) wild-type and (B) DATkd (genotype × lever × time, p < 0.0001). Mean runlength on each lever for (C) wild-type (D) DATkd (genotype × lever × time, p > 0.001). Mean rate of reinforcement across all contingency switches for (E) wild-type and (F) DATkd on the low → high cost lever (solid line, gold shading) and high → low cost lever (dotted line, gray shading) averaged across all episodes of contingency switches between levers (vertical dashed lines). Shading = S.E.M., N = 10.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2991243&req=5

Figure 2: Mean allocation of effort and runlength on the high and low cost lever following the switch in reward contingency (dashed line). Mean lever presses per minute 10 min before and after reward contingency switch for (A) wild-type and (B) DATkd (genotype × lever × time, p < 0.0001). Mean runlength on each lever for (C) wild-type (D) DATkd (genotype × lever × time, p > 0.001). Mean rate of reinforcement across all contingency switches for (E) wild-type and (F) DATkd on the low → high cost lever (solid line, gold shading) and high → low cost lever (dotted line, gray shading) averaged across all episodes of contingency switches between levers (vertical dashed lines). Shading = S.E.M., N = 10.

Mentions: All events – lever presses, pellet delivery, cost change between levers – were recorded and time-stamped using Med-PCIV software (Med-Associates, St. Albans, VT, USA). The data was then imported into MATLAB for analysis. Total consumption, high cost, low cost presses, ratio of low-cost to total, average cost per pellet, number of meals per day, average size of meals and duration of meals were calculated directly by the program operating the experiment (i.e., Figure 1 and Table 1). The onset of a meal was defined as the procurement of one pellet and the offset defined as the last pellet earned before 30 min elapsed without procuring a pellet. To calculate average lever pressing before and after episodes of cost switching between the levers, averaged across the experiment (Figure 2), all experimental days (i.e., with a cost differential between levers) were combined into a single dataset for each mouse. The time points for all cost switches were identified and a 10-min window (data recorded in 0.1 s bins) before and after each were averaged across switch episodes. The mean over all events was smoothed with a half-Gaussian filter using a weighted average kernel to retain original y-axis values from the data. The resulting smoothed data were averaged across mice within each genotype. To calculate runlength averaged across switch episodes, all lever presses within a run (consecutive presses on one lever without intervening presses on the other lever) were coded as the total length of the run (e.g., for a run of three presses, each would be coded as 3). Time bins in which no lever press occurred were coded with zero. When the mean across episodes was calculated, episodes without any pressing on either lever (e.g., mouse sleeping) were coded as not a number (NaN) and excluded from the mean. To make statistical comparisons of the above analyses, the raw data (.ie., not smoothed) across 0.1 s bins were collapsed into 20 one minute bins which were used as repeated measures in two-way ANOVAs. For single statistical comparisons, t-tests were used.


Tonic dopamine modulates exploitation of reward learning.

Beeler JA, Daw N, Frazier CR, Zhuang X - Front Behav Neurosci (2010)

Mean allocation of effort and runlength on the high and low cost lever following the switch in reward contingency (dashed line). Mean lever presses per minute 10 min before and after reward contingency switch for (A) wild-type and (B) DATkd (genotype × lever × time, p < 0.0001). Mean runlength on each lever for (C) wild-type (D) DATkd (genotype × lever × time, p > 0.001). Mean rate of reinforcement across all contingency switches for (E) wild-type and (F) DATkd on the low → high cost lever (solid line, gold shading) and high → low cost lever (dotted line, gray shading) averaged across all episodes of contingency switches between levers (vertical dashed lines). Shading = S.E.M., N = 10.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2991243&req=5

Figure 2: Mean allocation of effort and runlength on the high and low cost lever following the switch in reward contingency (dashed line). Mean lever presses per minute 10 min before and after reward contingency switch for (A) wild-type and (B) DATkd (genotype × lever × time, p < 0.0001). Mean runlength on each lever for (C) wild-type (D) DATkd (genotype × lever × time, p > 0.001). Mean rate of reinforcement across all contingency switches for (E) wild-type and (F) DATkd on the low → high cost lever (solid line, gold shading) and high → low cost lever (dotted line, gray shading) averaged across all episodes of contingency switches between levers (vertical dashed lines). Shading = S.E.M., N = 10.
Mentions: All events – lever presses, pellet delivery, cost change between levers – were recorded and time-stamped using Med-PCIV software (Med-Associates, St. Albans, VT, USA). The data was then imported into MATLAB for analysis. Total consumption, high cost, low cost presses, ratio of low-cost to total, average cost per pellet, number of meals per day, average size of meals and duration of meals were calculated directly by the program operating the experiment (i.e., Figure 1 and Table 1). The onset of a meal was defined as the procurement of one pellet and the offset defined as the last pellet earned before 30 min elapsed without procuring a pellet. To calculate average lever pressing before and after episodes of cost switching between the levers, averaged across the experiment (Figure 2), all experimental days (i.e., with a cost differential between levers) were combined into a single dataset for each mouse. The time points for all cost switches were identified and a 10-min window (data recorded in 0.1 s bins) before and after each were averaged across switch episodes. The mean over all events was smoothed with a half-Gaussian filter using a weighted average kernel to retain original y-axis values from the data. The resulting smoothed data were averaged across mice within each genotype. To calculate runlength averaged across switch episodes, all lever presses within a run (consecutive presses on one lever without intervening presses on the other lever) were coded as the total length of the run (e.g., for a run of three presses, each would be coded as 3). Time bins in which no lever press occurred were coded with zero. When the mean across episodes was calculated, episodes without any pressing on either lever (e.g., mouse sleeping) were coded as not a number (NaN) and excluded from the mean. To make statistical comparisons of the above analyses, the raw data (.ie., not smoothed) across 0.1 s bins were collapsed into 20 one minute bins which were used as repeated measures in two-way ANOVAs. For single statistical comparisons, t-tests were used.

Bottom Line: In this "closed economy" paradigm, subjects earn all of their food by pressing either of two levers, but the relative cost for food on each lever shifts frequently.We fit the lever choice data using reinforcement learning models to assess the distinction between acquisition and expression the models formalize.These data suggest that dopamine modulates the degree to which prior learning biases action selection and consequently alters the expression of learned, motivated behavior.

View Article: PubMed Central - PubMed

Affiliation: Department of Neurobiology, University of Chicago, Chicago, IL, USA. jabeeler@uchicago.edu

ABSTRACT
The impact of dopamine on adaptive behavior in a naturalistic environment is largely unexamined. Experimental work suggests that phasic dopamine is central to reinforcement learning whereas tonic dopamine may modulate performance without altering learning per se; however, this idea has not been developed formally or integrated with computational models of dopamine function. We quantitatively evaluate the role of tonic dopamine in these functions by studying the behavior of hyperdopaminergic DAT knockdown mice in an instrumental task in a semi-naturalistic homecage environment. In this "closed economy" paradigm, subjects earn all of their food by pressing either of two levers, but the relative cost for food on each lever shifts frequently. Compared to wild-type mice, hyperdopaminergic mice allocate more lever presses on high-cost levers, thus working harder to earn a given amount of food and maintain their body weight. However, both groups show a similarly quick reaction to shifts in lever cost, suggesting that the hyperdominergic mice are not slower at detecting changes, as with a learning deficit. We fit the lever choice data using reinforcement learning models to assess the distinction between acquisition and expression the models formalize. In these analyses, hyperdopaminergic mice displayed normal learning from recent reward history but diminished capacity to exploit this learning: a reduced coupling between choice and reward history. These data suggest that dopamine modulates the degree to which prior learning biases action selection and consequently alters the expression of learned, motivated behavior.

No MeSH data available.