Limits...
Reinforcement learning or active inference?

Friston KJ, Daunizeau J, Kiebel SJ - PLoS ONE (2009)

Bottom Line: This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming.Critically, we do not need to invoke the notion of reward, value or utility.The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain.

View Article: PubMed Central - PubMed

Affiliation: The Wellcome Trust Centre for Neuroimaging, University College London, London, United Kingdom. k.friston@fil.ion.ucl.ac.uk

ABSTRACT
This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain.

Show MeSH

Related in: MedlinePlus

The mountain car problem.This is a schematic representation of the mountain car problem: Left: The landscape or potential energy function that defines the motion of the car. This has a minima at . The mountain-car is shown at its uncontrolled stable position (transparent) and the desired parking position at the top of the hill on the right . Right: Forces experienced by the mountain-car at different positions due to the slope of the hill (blue). Critically, at  the force is minus one and cannot be exceeded by the cars engine, due to the squashing function applied to action.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2713351&req=5

pone-0006421-g002: The mountain car problem.This is a schematic representation of the mountain car problem: Left: The landscape or potential energy function that defines the motion of the car. This has a minima at . The mountain-car is shown at its uncontrolled stable position (transparent) and the desired parking position at the top of the hill on the right . Right: Forces experienced by the mountain-car at different positions due to the slope of the hill (blue). Critically, at the force is minus one and cannot be exceeded by the cars engine, due to the squashing function applied to action.

Mentions: In the mountain-car problem, one has to park a car on the top of a mountain (Figure 2). The car can be accelerated in a forward or reverse direction. The interesting problem here is that acceleration cannot overcome gravitational forces experienced during the ascent. This means that the only solution is to reverse up another hill and use momentum to carry it up the first. This represents an interesting problem, when considered in the state-space of position and velocity, ; the agent has to move away from the desired location () to attain its goal. This provides a metaphor for high-order conditioning, in which an agent must access goals vicariously, through sub-goals.


Reinforcement learning or active inference?

Friston KJ, Daunizeau J, Kiebel SJ - PLoS ONE (2009)

The mountain car problem.This is a schematic representation of the mountain car problem: Left: The landscape or potential energy function that defines the motion of the car. This has a minima at . The mountain-car is shown at its uncontrolled stable position (transparent) and the desired parking position at the top of the hill on the right . Right: Forces experienced by the mountain-car at different positions due to the slope of the hill (blue). Critically, at  the force is minus one and cannot be exceeded by the cars engine, due to the squashing function applied to action.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2713351&req=5

pone-0006421-g002: The mountain car problem.This is a schematic representation of the mountain car problem: Left: The landscape or potential energy function that defines the motion of the car. This has a minima at . The mountain-car is shown at its uncontrolled stable position (transparent) and the desired parking position at the top of the hill on the right . Right: Forces experienced by the mountain-car at different positions due to the slope of the hill (blue). Critically, at the force is minus one and cannot be exceeded by the cars engine, due to the squashing function applied to action.
Mentions: In the mountain-car problem, one has to park a car on the top of a mountain (Figure 2). The car can be accelerated in a forward or reverse direction. The interesting problem here is that acceleration cannot overcome gravitational forces experienced during the ascent. This means that the only solution is to reverse up another hill and use momentum to carry it up the first. This represents an interesting problem, when considered in the state-space of position and velocity, ; the agent has to move away from the desired location () to attain its goal. This provides a metaphor for high-order conditioning, in which an agent must access goals vicariously, through sub-goals.

Bottom Line: This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming.Critically, we do not need to invoke the notion of reward, value or utility.The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain.

View Article: PubMed Central - PubMed

Affiliation: The Wellcome Trust Centre for Neuroimaging, University College London, London, United Kingdom. k.friston@fil.ion.ucl.ac.uk

ABSTRACT
This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain.

Show MeSH
Related in: MedlinePlus