Limits...
Projective simulation for artificial intelligence.

Briegel HJ, De las Cuevas G - Sci Rep (2012)

Bottom Line: During simulation, the clips are screened for specific features which trigger factual action of the agent.The scheme is different from other, computational, notions of simulation, and it provides a new element in an embodied cognitive science approach to intelligent action and learning.Our model provides a natural route for generalization to quantum-mechanical operation and connects the fields of reinforcement learning and quantum computation.

View Article: PubMed Central - PubMed

Affiliation: Institut für Theoretische Physik, Universität Innsbruck, Technikerstrasse 25, A-6020 Innsbruck, Austria. hans.briegel@uibk.ac.at

ABSTRACT
We propose a model of a learning agent whose interaction with the environment is governed by a simulation-based projection, which allows the agent to project itself into future situations before it takes real action. Projective simulation is based on a random walk through a network of clips, which are elementary patches of episodic memory. The network of clips changes dynamically, both due to new perceptual input and due to certain compositional principles of the simulation process. During simulation, the clips are screened for specific features which trigger factual action of the agent. The scheme is different from other, computational, notions of simulation, and it provides a new element in an embodied cognitive science approach to intelligent action and learning. Our model provides a natural route for generalization to quantum-mechanical operation and connects the fields of reinforcement learning and quantum computation.

Show MeSH

Related in: MedlinePlus

Performance of agents with different values of the reflection time: R = 1 (lower curve) and R = 2 (upper curve).One can see that a large value of the reflection time leads to an increased learning speed. The dissipation rate (which is a measure of forgetfulness of the agent) is in both cases γ = 1/50. Ensemble average over 1000 runs with error bars indicating one standard deviation.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3351754&req=5

f7: Performance of agents with different values of the reflection time: R = 1 (lower curve) and R = 2 (upper curve).One can see that a large value of the reflection time leads to an increased learning speed. The dissipation rate (which is a measure of forgetfulness of the agent) is in both cases γ = 1/50. Ensemble average over 1000 runs with error bars indicating one standard deviation.

Mentions: Let us now come back to the notion of reflection. In Figure 7, we compare the performance of agents with different values of the reflection time R. (Here we consider again training with symbols of a single color.) One can see that larger values of the reflection time lead to an increased learning speed. The reason is that during the simulation virtual percept-action sequences are recalled together with the associated emotion tags (i.e. remembered rewards). If the associated tag does not indicate a previous reward of the simulated transition, the coupling-out of the actuator into motor action is suppressed and the simulation goes back to the initial clip. In this sense, the agent can “reflect upon” the right action and its (empirically likely) consequences by means of an iterated simulation, and is thus more likely to find the right actuator move before real action takes place.


Projective simulation for artificial intelligence.

Briegel HJ, De las Cuevas G - Sci Rep (2012)

Performance of agents with different values of the reflection time: R = 1 (lower curve) and R = 2 (upper curve).One can see that a large value of the reflection time leads to an increased learning speed. The dissipation rate (which is a measure of forgetfulness of the agent) is in both cases γ = 1/50. Ensemble average over 1000 runs with error bars indicating one standard deviation.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3351754&req=5

f7: Performance of agents with different values of the reflection time: R = 1 (lower curve) and R = 2 (upper curve).One can see that a large value of the reflection time leads to an increased learning speed. The dissipation rate (which is a measure of forgetfulness of the agent) is in both cases γ = 1/50. Ensemble average over 1000 runs with error bars indicating one standard deviation.
Mentions: Let us now come back to the notion of reflection. In Figure 7, we compare the performance of agents with different values of the reflection time R. (Here we consider again training with symbols of a single color.) One can see that larger values of the reflection time lead to an increased learning speed. The reason is that during the simulation virtual percept-action sequences are recalled together with the associated emotion tags (i.e. remembered rewards). If the associated tag does not indicate a previous reward of the simulated transition, the coupling-out of the actuator into motor action is suppressed and the simulation goes back to the initial clip. In this sense, the agent can “reflect upon” the right action and its (empirically likely) consequences by means of an iterated simulation, and is thus more likely to find the right actuator move before real action takes place.

Bottom Line: During simulation, the clips are screened for specific features which trigger factual action of the agent.The scheme is different from other, computational, notions of simulation, and it provides a new element in an embodied cognitive science approach to intelligent action and learning.Our model provides a natural route for generalization to quantum-mechanical operation and connects the fields of reinforcement learning and quantum computation.

View Article: PubMed Central - PubMed

Affiliation: Institut für Theoretische Physik, Universität Innsbruck, Technikerstrasse 25, A-6020 Innsbruck, Austria. hans.briegel@uibk.ac.at

ABSTRACT
We propose a model of a learning agent whose interaction with the environment is governed by a simulation-based projection, which allows the agent to project itself into future situations before it takes real action. Projective simulation is based on a random walk through a network of clips, which are elementary patches of episodic memory. The network of clips changes dynamically, both due to new perceptual input and due to certain compositional principles of the simulation process. During simulation, the clips are screened for specific features which trigger factual action of the agent. The scheme is different from other, computational, notions of simulation, and it provides a new element in an embodied cognitive science approach to intelligent action and learning. Our model provides a natural route for generalization to quantum-mechanical operation and connects the fields of reinforcement learning and quantum computation.

Show MeSH
Related in: MedlinePlus