Limits...
Theory of choice in bandit, information sampling and foraging tasks.

Averbeck BB - PLoS Comput. Biol. (2015)

Bottom Line: The tasks drive these trade-offs in unique ways, however.Correspondingly, decreasing uncertainty increases the relative value of actions that pay-off immediately.For foraging tasks the time-horizon plays the dominant role, as choices do not affect future uncertainty in these tasks.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda, Maryland, United States of America.

ABSTRACT
Decision making has been studied with a wide array of tasks. Here we examine the theoretical structure of bandit, information sampling and foraging tasks. These tasks move beyond tasks where the choice in the current trial does not affect future expected rewards. We have modeled these tasks using Markov decision processes (MDPs). MDPs provide a general framework for modeling tasks in which decisions affect the information on which future choices will be made. Under the assumption that agents are maximizing expected rewards, MDPs provide normative solutions. We find that all three classes of tasks pose choices among actions which trade-off immediate and future expected rewards. The tasks drive these trade-offs in unique ways, however. For bandit and information sampling tasks, increasing uncertainty or the time horizon shifts value to actions that pay-off in the future. Correspondingly, decreasing uncertainty increases the relative value of actions that pay-off immediately. For foraging tasks the time-horizon plays the dominant role, as choices do not affect future uncertainty in these tasks.

Show MeSH
Sampling foraging task.A. State space model for the foraging task. The numbers in the circles indicate one of the offer pairs. As there were 6 available individual gambles, there were 15 offer pairs possible in each foraging round. Bottom of panel shows gambles that would be available in a specific foraging bout. In each trial subjects are shown a randomly sampled pair from the 6. If they accept the pair, they move on to the decision stage. If they sample again, a new pair is shown, and they have to decide whether to accept the pair, or sample again, etc. B. Expected value for accepting the current gamble or sampling again for an example sequence of draws. Option below trial number is the option pair that was presented on that trial.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4376795&req=5

pcbi.1004164.g007: Sampling foraging task.A. State space model for the foraging task. The numbers in the circles indicate one of the offer pairs. As there were 6 available individual gambles, there were 15 offer pairs possible in each foraging round. Bottom of panel shows gambles that would be available in a specific foraging bout. In each trial subjects are shown a randomly sampled pair from the 6. If they accept the pair, they move on to the decision stage. If they sample again, a new pair is shown, and they have to decide whether to accept the pair, or sample again, etc. B. Expected value for accepting the current gamble or sampling again for an example sequence of draws. Option below trial number is the option pair that was presented on that trial.

Mentions: The final tasks we considered were foraging tasks. Much like the tasks examined above, these tasks trade-off immediate and future expected values. Should one stay in the current patch whose resources are being depleted (i.e. choose IEV) or travel to a new patch (i.e. choose FEV) [19]? Or, should one sample again (i.e. choose FEV) or commit to the current gamble on offer (i.e. choose IEV) [18]? The state spaces for these tasks differ in a fundamental way from the state spaces in the bandit and information sampling tasks (Fig. 6A and 7A). The state spaces for the foraging tasks are recursive. Stated another way, the state spaces for the foraging tasks do not represent learning or information accumulation. Learning or information accumulation are not recursive because you do not return to the same state (technically, this is not completely accurate, as one can with some probability, return to a previous state in either the non-stationary bandit or the novelty bandit). Rather, in the foraging tasks the current state is provided to the animal and the animal does not have to estimate beliefs or distributions over states. Therefore these tasks are MDPs, as opposed to POMDPs where the state is hidden. In the foraging tasks one observes the state directly.


Theory of choice in bandit, information sampling and foraging tasks.

Averbeck BB - PLoS Comput. Biol. (2015)

Sampling foraging task.A. State space model for the foraging task. The numbers in the circles indicate one of the offer pairs. As there were 6 available individual gambles, there were 15 offer pairs possible in each foraging round. Bottom of panel shows gambles that would be available in a specific foraging bout. In each trial subjects are shown a randomly sampled pair from the 6. If they accept the pair, they move on to the decision stage. If they sample again, a new pair is shown, and they have to decide whether to accept the pair, or sample again, etc. B. Expected value for accepting the current gamble or sampling again for an example sequence of draws. Option below trial number is the option pair that was presented on that trial.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4376795&req=5

pcbi.1004164.g007: Sampling foraging task.A. State space model for the foraging task. The numbers in the circles indicate one of the offer pairs. As there were 6 available individual gambles, there were 15 offer pairs possible in each foraging round. Bottom of panel shows gambles that would be available in a specific foraging bout. In each trial subjects are shown a randomly sampled pair from the 6. If they accept the pair, they move on to the decision stage. If they sample again, a new pair is shown, and they have to decide whether to accept the pair, or sample again, etc. B. Expected value for accepting the current gamble or sampling again for an example sequence of draws. Option below trial number is the option pair that was presented on that trial.
Mentions: The final tasks we considered were foraging tasks. Much like the tasks examined above, these tasks trade-off immediate and future expected values. Should one stay in the current patch whose resources are being depleted (i.e. choose IEV) or travel to a new patch (i.e. choose FEV) [19]? Or, should one sample again (i.e. choose FEV) or commit to the current gamble on offer (i.e. choose IEV) [18]? The state spaces for these tasks differ in a fundamental way from the state spaces in the bandit and information sampling tasks (Fig. 6A and 7A). The state spaces for the foraging tasks are recursive. Stated another way, the state spaces for the foraging tasks do not represent learning or information accumulation. Learning or information accumulation are not recursive because you do not return to the same state (technically, this is not completely accurate, as one can with some probability, return to a previous state in either the non-stationary bandit or the novelty bandit). Rather, in the foraging tasks the current state is provided to the animal and the animal does not have to estimate beliefs or distributions over states. Therefore these tasks are MDPs, as opposed to POMDPs where the state is hidden. In the foraging tasks one observes the state directly.

Bottom Line: The tasks drive these trade-offs in unique ways, however.Correspondingly, decreasing uncertainty increases the relative value of actions that pay-off immediately.For foraging tasks the time-horizon plays the dominant role, as choices do not affect future uncertainty in these tasks.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda, Maryland, United States of America.

ABSTRACT
Decision making has been studied with a wide array of tasks. Here we examine the theoretical structure of bandit, information sampling and foraging tasks. These tasks move beyond tasks where the choice in the current trial does not affect future expected rewards. We have modeled these tasks using Markov decision processes (MDPs). MDPs provide a general framework for modeling tasks in which decisions affect the information on which future choices will be made. Under the assumption that agents are maximizing expected rewards, MDPs provide normative solutions. We find that all three classes of tasks pose choices among actions which trade-off immediate and future expected rewards. The tasks drive these trade-offs in unique ways, however. For bandit and information sampling tasks, increasing uncertainty or the time horizon shifts value to actions that pay-off in the future. Correspondingly, decreasing uncertainty increases the relative value of actions that pay-off immediately. For foraging tasks the time-horizon plays the dominant role, as choices do not affect future uncertainty in these tasks.

Show MeSH