Limits...
Projective simulation for artificial intelligence.

Briegel HJ, De las Cuevas G - Sci Rep (2012)

Bottom Line: During simulation, the clips are screened for specific features which trigger factual action of the agent.The scheme is different from other, computational, notions of simulation, and it provides a new element in an embodied cognitive science approach to intelligent action and learning.Our model provides a natural route for generalization to quantum-mechanical operation and connects the fields of reinforcement learning and quantum computation.

View Article: PubMed Central - PubMed

Affiliation: Institut für Theoretische Physik, Universität Innsbruck, Technikerstrasse 25, A-6020 Innsbruck, Austria. hans.briegel@uibk.ac.at

ABSTRACT
We propose a model of a learning agent whose interaction with the environment is governed by a simulation-based projection, which allows the agent to project itself into future situations before it takes real action. Projective simulation is based on a random walk through a network of clips, which are elementary patches of episodic memory. The network of clips changes dynamically, both due to new perceptual input and due to certain compositional principles of the simulation process. During simulation, the clips are screened for specific features which trigger factual action of the agent. The scheme is different from other, computational, notions of simulation, and it provides a new element in an embodied cognitive science approach to intelligent action and learning. Our model provides a natural route for generalization to quantum-mechanical operation and connects the fields of reinforcement learning and quantum computation.

Show MeSH
Effects of associative learning on the state of the episodic memory at different times.The thickness of the lines indicate the transition probabilities between different clips. (a) Initial network, before any percept has affected the agent, (b) State of the network after the agent has been trained (dotted arrows) with symbols of one color (red). (c) When the agent is presented with symbols of a different color (blue), the estabished links will direct the simulation process (probabilisically) to the previously “trained” region with well-developed links. This realizes a sort of associative memory.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3351754&req=5

f12: Effects of associative learning on the state of the episodic memory at different times.The thickness of the lines indicate the transition probabilities between different clips. (a) Initial network, before any percept has affected the agent, (b) State of the network after the agent has been trained (dotted arrows) with symbols of one color (red). (c) When the agent is presented with symbols of a different color (blue), the estabished links will direct the simulation process (probabilisically) to the previously “trained” region with well-developed links. This realizes a sort of associative memory.

Mentions: The structure of the memory that gives rise to these learning curves is sketched in Figure 12, which corresponds to a duplicated network described before, albeit with additional links between percepts of equal shape but different color. In Figure 12, we see the effect of learning on the state of the network at different times. Initially, before any stimulus/percept has affected the agent, the network looks as in Figure 12(a), with innate connections of unit weight between all possible percepts and actuators, respectively. Figure 12(b) shows the state of the network after the agent has been trained (indicated by the dotted arrows) with symbols of one color (red). We see that the weights for rewarded transitions have grown substantially such that the presentation of a red symbol will lead to the rewarded actuator move with high probability. Moreover, the activation of the red-percept clips has initialized the incoming connections from similar percept clips with a different (blue) color. In this example, the weights are initialized with the value K. This initialization has, at this stage, no effect on the learning performance for symbols with a red color. However, when the agent is presented with symbols of a different color, the established links will direct the simulation process (probabilistically) to a “trained” region with well-developed links. This realizes a sort of associative memory (Figure 12(c)). In the philosophy of projective simulation, association is a special instance of a simulation process, namely a random walk in clip space where similar clips can call each other with certain probabilities.


Projective simulation for artificial intelligence.

Briegel HJ, De las Cuevas G - Sci Rep (2012)

Effects of associative learning on the state of the episodic memory at different times.The thickness of the lines indicate the transition probabilities between different clips. (a) Initial network, before any percept has affected the agent, (b) State of the network after the agent has been trained (dotted arrows) with symbols of one color (red). (c) When the agent is presented with symbols of a different color (blue), the estabished links will direct the simulation process (probabilisically) to the previously “trained” region with well-developed links. This realizes a sort of associative memory.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3351754&req=5

f12: Effects of associative learning on the state of the episodic memory at different times.The thickness of the lines indicate the transition probabilities between different clips. (a) Initial network, before any percept has affected the agent, (b) State of the network after the agent has been trained (dotted arrows) with symbols of one color (red). (c) When the agent is presented with symbols of a different color (blue), the estabished links will direct the simulation process (probabilisically) to the previously “trained” region with well-developed links. This realizes a sort of associative memory.
Mentions: The structure of the memory that gives rise to these learning curves is sketched in Figure 12, which corresponds to a duplicated network described before, albeit with additional links between percepts of equal shape but different color. In Figure 12, we see the effect of learning on the state of the network at different times. Initially, before any stimulus/percept has affected the agent, the network looks as in Figure 12(a), with innate connections of unit weight between all possible percepts and actuators, respectively. Figure 12(b) shows the state of the network after the agent has been trained (indicated by the dotted arrows) with symbols of one color (red). We see that the weights for rewarded transitions have grown substantially such that the presentation of a red symbol will lead to the rewarded actuator move with high probability. Moreover, the activation of the red-percept clips has initialized the incoming connections from similar percept clips with a different (blue) color. In this example, the weights are initialized with the value K. This initialization has, at this stage, no effect on the learning performance for symbols with a red color. However, when the agent is presented with symbols of a different color, the established links will direct the simulation process (probabilistically) to a “trained” region with well-developed links. This realizes a sort of associative memory (Figure 12(c)). In the philosophy of projective simulation, association is a special instance of a simulation process, namely a random walk in clip space where similar clips can call each other with certain probabilities.

Bottom Line: During simulation, the clips are screened for specific features which trigger factual action of the agent.The scheme is different from other, computational, notions of simulation, and it provides a new element in an embodied cognitive science approach to intelligent action and learning.Our model provides a natural route for generalization to quantum-mechanical operation and connects the fields of reinforcement learning and quantum computation.

View Article: PubMed Central - PubMed

Affiliation: Institut für Theoretische Physik, Universität Innsbruck, Technikerstrasse 25, A-6020 Innsbruck, Austria. hans.briegel@uibk.ac.at

ABSTRACT
We propose a model of a learning agent whose interaction with the environment is governed by a simulation-based projection, which allows the agent to project itself into future situations before it takes real action. Projective simulation is based on a random walk through a network of clips, which are elementary patches of episodic memory. The network of clips changes dynamically, both due to new perceptual input and due to certain compositional principles of the simulation process. During simulation, the clips are screened for specific features which trigger factual action of the agent. The scheme is different from other, computational, notions of simulation, and it provides a new element in an embodied cognitive science approach to intelligent action and learning. Our model provides a natural route for generalization to quantum-mechanical operation and connects the fields of reinforcement learning and quantum computation.

Show MeSH