Limits...
Projective simulation for artificial intelligence.

Briegel HJ, De las Cuevas G - Sci Rep (2012)

Bottom Line: During simulation, the clips are screened for specific features which trigger factual action of the agent.The scheme is different from other, computational, notions of simulation, and it provides a new element in an embodied cognitive science approach to intelligent action and learning.Our model provides a natural route for generalization to quantum-mechanical operation and connects the fields of reinforcement learning and quantum computation.

View Article: PubMed Central - PubMed

Affiliation: Institut für Theoretische Physik, Universität Innsbruck, Technikerstrasse 25, A-6020 Innsbruck, Austria. hans.briegel@uibk.ac.at

ABSTRACT
We propose a model of a learning agent whose interaction with the environment is governed by a simulation-based projection, which allows the agent to project itself into future situations before it takes real action. Projective simulation is based on a random walk through a network of clips, which are elementary patches of episodic memory. The network of clips changes dynamically, both due to new perceptual input and due to certain compositional principles of the simulation process. During simulation, the clips are screened for specific features which trigger factual action of the agent. The scheme is different from other, computational, notions of simulation, and it provides a new element in an embodied cognitive science approach to intelligent action and learning. Our model provides a natural route for generalization to quantum-mechanical operation and connects the fields of reinforcement learning and quantum computation.

Show MeSH

Related in: MedlinePlus

Initial growth and asymptotic value of average blocking efficiency for different sizes of percept (/S/) and actuator (/A/) space, and reward parameter λ.The learning curves are obtained from a numerical average over an ensemble of 10000 runs with random percept stimulation (γ = 0.01). Error bars (not shown) are of the order of the fluctuations in the learning curves). The analytic lines are obtained from (25), see Methods.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3351754&req=5

f8: Initial growth and asymptotic value of average blocking efficiency for different sizes of percept (/S/) and actuator (/A/) space, and reward parameter λ.The learning curves are obtained from a numerical average over an ensemble of 10000 runs with random percept stimulation (γ = 0.01). Error bars (not shown) are of the order of the fluctuations in the learning curves). The analytic lines are obtained from (25), see Methods.

Mentions: We next investigate the performance of the agent for more complex environment in order to illustrate the scalability of our model. In the invasion game, a natural scaling parameter is given by the size /S/ of the percept space (number of doors through which attacker can invade) and/or the size /A/ of the actuator space. In Figure 8, we plot the learning curves (evolution of the average blocking efficiency) for different values of /S/, /A/, and the reward parameter λ. It can be seen that both the learning speed and the asymptotic blocking efficiency depends (for fixed value of damping γ) on the size of percept and actuator space and decreases with their problem size.


Projective simulation for artificial intelligence.

Briegel HJ, De las Cuevas G - Sci Rep (2012)

Initial growth and asymptotic value of average blocking efficiency for different sizes of percept (/S/) and actuator (/A/) space, and reward parameter λ.The learning curves are obtained from a numerical average over an ensemble of 10000 runs with random percept stimulation (γ = 0.01). Error bars (not shown) are of the order of the fluctuations in the learning curves). The analytic lines are obtained from (25), see Methods.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3351754&req=5

f8: Initial growth and asymptotic value of average blocking efficiency for different sizes of percept (/S/) and actuator (/A/) space, and reward parameter λ.The learning curves are obtained from a numerical average over an ensemble of 10000 runs with random percept stimulation (γ = 0.01). Error bars (not shown) are of the order of the fluctuations in the learning curves). The analytic lines are obtained from (25), see Methods.
Mentions: We next investigate the performance of the agent for more complex environment in order to illustrate the scalability of our model. In the invasion game, a natural scaling parameter is given by the size /S/ of the percept space (number of doors through which attacker can invade) and/or the size /A/ of the actuator space. In Figure 8, we plot the learning curves (evolution of the average blocking efficiency) for different values of /S/, /A/, and the reward parameter λ. It can be seen that both the learning speed and the asymptotic blocking efficiency depends (for fixed value of damping γ) on the size of percept and actuator space and decreases with their problem size.

Bottom Line: During simulation, the clips are screened for specific features which trigger factual action of the agent.The scheme is different from other, computational, notions of simulation, and it provides a new element in an embodied cognitive science approach to intelligent action and learning.Our model provides a natural route for generalization to quantum-mechanical operation and connects the fields of reinforcement learning and quantum computation.

View Article: PubMed Central - PubMed

Affiliation: Institut für Theoretische Physik, Universität Innsbruck, Technikerstrasse 25, A-6020 Innsbruck, Austria. hans.briegel@uibk.ac.at

ABSTRACT
We propose a model of a learning agent whose interaction with the environment is governed by a simulation-based projection, which allows the agent to project itself into future situations before it takes real action. Projective simulation is based on a random walk through a network of clips, which are elementary patches of episodic memory. The network of clips changes dynamically, both due to new perceptual input and due to certain compositional principles of the simulation process. During simulation, the clips are screened for specific features which trigger factual action of the agent. The scheme is different from other, computational, notions of simulation, and it provides a new element in an embodied cognitive science approach to intelligent action and learning. Our model provides a natural route for generalization to quantum-mechanical operation and connects the fields of reinforcement learning and quantum computation.

Show MeSH
Related in: MedlinePlus