Limits...
Reinforcement learning or active inference?

Friston KJ, Daunizeau J, Kiebel SJ - PLoS ONE (2009)

Bottom Line: This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming.Critically, we do not need to invoke the notion of reward, value or utility.The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain.

View Article: PubMed Central - PubMed

Affiliation: The Wellcome Trust Centre for Neuroimaging, University College London, London, United Kingdom. k.friston@fil.ion.ucl.ac.uk

ABSTRACT
This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain.

Show MeSH

Related in: MedlinePlus

An agent that thinks it is a Lorenz attractor.This figure illustrates the behaviour of an agent whose trajectories are drawn to a Lorenz attractor. However, this is no ordinary attractor; the trajectories are driven purely by action (displayed as a function of time in the right panels). Action tries to suppress prediction errors on motion through this three dimensional state-space (blue lines in the left panels). These prediction errors are the difference between sensed and expected motion based on the agent's generative model;  (red arrows: evaluated at ). These prior expectations are based on a Lorentz attractor. The ensuing behaviour can be regarded as a form of chaos control. Critically, this autonomous behaviour is very resistant to random forces on the agent. This can be seen by comparing the top row (with no perturbations) with the middle row, where the first state has been perturbed with a smooth exogenous force (broken line). Note that action counters this perturbation and the ensuing trajectories are essentially unaffected. The bottom row shows exactly the same simulation but with action turned off. Here, the environmental forces cause the agents to precess randomly around the fixed point attractor of . These simulations used a log-precision on the random fluctuations of 16.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2713351&req=5

pone-0006421-g001: An agent that thinks it is a Lorenz attractor.This figure illustrates the behaviour of an agent whose trajectories are drawn to a Lorenz attractor. However, this is no ordinary attractor; the trajectories are driven purely by action (displayed as a function of time in the right panels). Action tries to suppress prediction errors on motion through this three dimensional state-space (blue lines in the left panels). These prediction errors are the difference between sensed and expected motion based on the agent's generative model; (red arrows: evaluated at ). These prior expectations are based on a Lorentz attractor. The ensuing behaviour can be regarded as a form of chaos control. Critically, this autonomous behaviour is very resistant to random forces on the agent. This can be seen by comparing the top row (with no perturbations) with the middle row, where the first state has been perturbed with a smooth exogenous force (broken line). Note that action counters this perturbation and the ensuing trajectories are essentially unaffected. The bottom row shows exactly the same simulation but with action turned off. Here, the environmental forces cause the agents to precess randomly around the fixed point attractor of . These simulations used a log-precision on the random fluctuations of 16.

Mentions: Equation 9 embodies a nice convergence of action and perception; perception tries to suppress prediction error by adjusting expectations to furnish better predictions of signals, while action tries to fulfil these predictions by changing those signals. Figure 1 illustrates this scheme by showing the trajectory of an agent that thinks it is a strange attractor. We created this agent by making its generative model a Lorenz attractor:(10)


Reinforcement learning or active inference?

Friston KJ, Daunizeau J, Kiebel SJ - PLoS ONE (2009)

An agent that thinks it is a Lorenz attractor.This figure illustrates the behaviour of an agent whose trajectories are drawn to a Lorenz attractor. However, this is no ordinary attractor; the trajectories are driven purely by action (displayed as a function of time in the right panels). Action tries to suppress prediction errors on motion through this three dimensional state-space (blue lines in the left panels). These prediction errors are the difference between sensed and expected motion based on the agent's generative model;  (red arrows: evaluated at ). These prior expectations are based on a Lorentz attractor. The ensuing behaviour can be regarded as a form of chaos control. Critically, this autonomous behaviour is very resistant to random forces on the agent. This can be seen by comparing the top row (with no perturbations) with the middle row, where the first state has been perturbed with a smooth exogenous force (broken line). Note that action counters this perturbation and the ensuing trajectories are essentially unaffected. The bottom row shows exactly the same simulation but with action turned off. Here, the environmental forces cause the agents to precess randomly around the fixed point attractor of . These simulations used a log-precision on the random fluctuations of 16.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2713351&req=5

pone-0006421-g001: An agent that thinks it is a Lorenz attractor.This figure illustrates the behaviour of an agent whose trajectories are drawn to a Lorenz attractor. However, this is no ordinary attractor; the trajectories are driven purely by action (displayed as a function of time in the right panels). Action tries to suppress prediction errors on motion through this three dimensional state-space (blue lines in the left panels). These prediction errors are the difference between sensed and expected motion based on the agent's generative model; (red arrows: evaluated at ). These prior expectations are based on a Lorentz attractor. The ensuing behaviour can be regarded as a form of chaos control. Critically, this autonomous behaviour is very resistant to random forces on the agent. This can be seen by comparing the top row (with no perturbations) with the middle row, where the first state has been perturbed with a smooth exogenous force (broken line). Note that action counters this perturbation and the ensuing trajectories are essentially unaffected. The bottom row shows exactly the same simulation but with action turned off. Here, the environmental forces cause the agents to precess randomly around the fixed point attractor of . These simulations used a log-precision on the random fluctuations of 16.
Mentions: Equation 9 embodies a nice convergence of action and perception; perception tries to suppress prediction error by adjusting expectations to furnish better predictions of signals, while action tries to fulfil these predictions by changing those signals. Figure 1 illustrates this scheme by showing the trajectory of an agent that thinks it is a strange attractor. We created this agent by making its generative model a Lorenz attractor:(10)

Bottom Line: This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming.Critically, we do not need to invoke the notion of reward, value or utility.The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain.

View Article: PubMed Central - PubMed

Affiliation: The Wellcome Trust Centre for Neuroimaging, University College London, London, United Kingdom. k.friston@fil.ion.ucl.ac.uk

ABSTRACT
This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain.

Show MeSH
Related in: MedlinePlus