Limits...
Projective simulation for artificial intelligence.

Briegel HJ, De las Cuevas G - Sci Rep (2012)

Bottom Line: During simulation, the clips are screened for specific features which trigger factual action of the agent.The scheme is different from other, computational, notions of simulation, and it provides a new element in an embodied cognitive science approach to intelligent action and learning.Our model provides a natural route for generalization to quantum-mechanical operation and connects the fields of reinforcement learning and quantum computation.

View Article: PubMed Central - PubMed

Affiliation: Institut für Theoretische Physik, Universität Innsbruck, Technikerstrasse 25, A-6020 Innsbruck, Austria. hans.briegel@uibk.ac.at

ABSTRACT
We propose a model of a learning agent whose interaction with the environment is governed by a simulation-based projection, which allows the agent to project itself into future situations before it takes real action. Projective simulation is based on a random walk through a network of clips, which are elementary patches of episodic memory. The network of clips changes dynamically, both due to new perceptual input and due to certain compositional principles of the simulation process. During simulation, the clips are screened for specific features which trigger factual action of the agent. The scheme is different from other, computational, notions of simulation, and it provides a new element in an embodied cognitive science approach to intelligent action and learning. Our model provides a natural route for generalization to quantum-mechanical operation and connects the fields of reinforcement learning and quantum computation.

Show MeSH

Related in: MedlinePlus

Learning curves of the defender agent for different values of the dissipation rate γ.The blocking efficiency increases with time and approaches its maximum value exponentially fast in the number of cycles. For γ = 0 the blocking efficiency approaches the limiting value 1, i.e. for each shown percept it will choose the right action. For larger values of γ, the maximum achievable blocking efficiency is reduced, since the agent forgets part of what it has learnt. At time step n = 250, the meaning of symbols is inverted, i.e. the symbol  () now indicates that the attacker is going to move left (right). Since the agent has already built up memory, it needs some time to adapt to the new situation. One can see a trade-off between adaptation speed, one one side, and achievable blocking efficiency, on the other side. Here, we have chosen an unbiased training strategy, P(n) = 1 = /S/. The curves are averages of the learning curves for an ensemble of 1000 agents. Error bars (indicating 1 standard deviation over the sample mean) are shown on every fifth data point not to clutter the diagram, which also applies to the error bars in subsequent Figures.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3351754&req=5

f5: Learning curves of the defender agent for different values of the dissipation rate γ.The blocking efficiency increases with time and approaches its maximum value exponentially fast in the number of cycles. For γ = 0 the blocking efficiency approaches the limiting value 1, i.e. for each shown percept it will choose the right action. For larger values of γ, the maximum achievable blocking efficiency is reduced, since the agent forgets part of what it has learnt. At time step n = 250, the meaning of symbols is inverted, i.e. the symbol () now indicates that the attacker is going to move left (right). Since the agent has already built up memory, it needs some time to adapt to the new situation. One can see a trade-off between adaptation speed, one one side, and achievable blocking efficiency, on the other side. Here, we have chosen an unbiased training strategy, P(n) = 1 = /S/. The curves are averages of the learning curves for an ensemble of 1000 agents. Error bars (indicating 1 standard deviation over the sample mean) are shown on every fifth data point not to clutter the diagram, which also applies to the error bars in subsequent Figures.

Mentions: In the following, we show numeric results for different agent specifications. Let us start with agents with reflection time R = 1. In Figure 5, we plot the learning curves for different values of the dissipation rate γ (forgetfulness). One can see that the blocking efficiency increases with time and approaches its maximum value typically exponentially fast in the number of cycles. For small values of γ it approaches the limiting value 1, i.e. the agent will choose the right action for every shown percept. For increasing values of γ, we see that the maximum achievable blocking efficiency is reduced, since the agent keeps forgetting part of what it has learnt. At time step n = 250, the attacker suddenly changes the meaning of symbols: now indicates that the attacker is going to move left (right). Since the agent has already built up memory, it needs some time to adapt to the new situation. Here, one can see that forgetfulness can also have a positive effect. For weak dissipation, the agent needs longer to unlearn, i.e. to dissipate its memory and adapt to the new situation. Thus there is a trade-off between adaptation speed, on one side, and achievable blocking efficiency, on the other side. Depending on whether learning speed or achievable efficiency is more important, one will choose the agent specification accordingly. Note that for random action, which is obtained by setting λ = 0 in (6), the average blocking is 0.5 (not shown in Figure 5).


Projective simulation for artificial intelligence.

Briegel HJ, De las Cuevas G - Sci Rep (2012)

Learning curves of the defender agent for different values of the dissipation rate γ.The blocking efficiency increases with time and approaches its maximum value exponentially fast in the number of cycles. For γ = 0 the blocking efficiency approaches the limiting value 1, i.e. for each shown percept it will choose the right action. For larger values of γ, the maximum achievable blocking efficiency is reduced, since the agent forgets part of what it has learnt. At time step n = 250, the meaning of symbols is inverted, i.e. the symbol  () now indicates that the attacker is going to move left (right). Since the agent has already built up memory, it needs some time to adapt to the new situation. One can see a trade-off between adaptation speed, one one side, and achievable blocking efficiency, on the other side. Here, we have chosen an unbiased training strategy, P(n) = 1 = /S/. The curves are averages of the learning curves for an ensemble of 1000 agents. Error bars (indicating 1 standard deviation over the sample mean) are shown on every fifth data point not to clutter the diagram, which also applies to the error bars in subsequent Figures.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3351754&req=5

f5: Learning curves of the defender agent for different values of the dissipation rate γ.The blocking efficiency increases with time and approaches its maximum value exponentially fast in the number of cycles. For γ = 0 the blocking efficiency approaches the limiting value 1, i.e. for each shown percept it will choose the right action. For larger values of γ, the maximum achievable blocking efficiency is reduced, since the agent forgets part of what it has learnt. At time step n = 250, the meaning of symbols is inverted, i.e. the symbol () now indicates that the attacker is going to move left (right). Since the agent has already built up memory, it needs some time to adapt to the new situation. One can see a trade-off between adaptation speed, one one side, and achievable blocking efficiency, on the other side. Here, we have chosen an unbiased training strategy, P(n) = 1 = /S/. The curves are averages of the learning curves for an ensemble of 1000 agents. Error bars (indicating 1 standard deviation over the sample mean) are shown on every fifth data point not to clutter the diagram, which also applies to the error bars in subsequent Figures.
Mentions: In the following, we show numeric results for different agent specifications. Let us start with agents with reflection time R = 1. In Figure 5, we plot the learning curves for different values of the dissipation rate γ (forgetfulness). One can see that the blocking efficiency increases with time and approaches its maximum value typically exponentially fast in the number of cycles. For small values of γ it approaches the limiting value 1, i.e. the agent will choose the right action for every shown percept. For increasing values of γ, we see that the maximum achievable blocking efficiency is reduced, since the agent keeps forgetting part of what it has learnt. At time step n = 250, the attacker suddenly changes the meaning of symbols: now indicates that the attacker is going to move left (right). Since the agent has already built up memory, it needs some time to adapt to the new situation. Here, one can see that forgetfulness can also have a positive effect. For weak dissipation, the agent needs longer to unlearn, i.e. to dissipate its memory and adapt to the new situation. Thus there is a trade-off between adaptation speed, on one side, and achievable blocking efficiency, on the other side. Depending on whether learning speed or achievable efficiency is more important, one will choose the agent specification accordingly. Note that for random action, which is obtained by setting λ = 0 in (6), the average blocking is 0.5 (not shown in Figure 5).

Bottom Line: During simulation, the clips are screened for specific features which trigger factual action of the agent.The scheme is different from other, computational, notions of simulation, and it provides a new element in an embodied cognitive science approach to intelligent action and learning.Our model provides a natural route for generalization to quantum-mechanical operation and connects the fields of reinforcement learning and quantum computation.

View Article: PubMed Central - PubMed

Affiliation: Institut für Theoretische Physik, Universität Innsbruck, Technikerstrasse 25, A-6020 Innsbruck, Austria. hans.briegel@uibk.ac.at

ABSTRACT
We propose a model of a learning agent whose interaction with the environment is governed by a simulation-based projection, which allows the agent to project itself into future situations before it takes real action. Projective simulation is based on a random walk through a network of clips, which are elementary patches of episodic memory. The network of clips changes dynamically, both due to new perceptual input and due to certain compositional principles of the simulation process. During simulation, the clips are screened for specific features which trigger factual action of the agent. The scheme is different from other, computational, notions of simulation, and it provides a new element in an embodied cognitive science approach to intelligent action and learning. Our model provides a natural route for generalization to quantum-mechanical operation and connects the fields of reinforcement learning and quantum computation.

Show MeSH
Related in: MedlinePlus