Limits...
Reward-Modulated Hebbian Plasticity as Leverage for Partially Embodied Control in Compliant Robotics.

Burms J, Caluwaerts K, Dambre J - Front Neurorobot (2015)

Bottom Line: Our results demonstrate the universal applicability of reward-modulated Hebbian learning.Furthermore, they demonstrate the robustness of systems trained with the learning rule.This link between compliant robotics and neural networks is also the main reason for our search for simple universal learning rules for both neural networks and robotics.

View Article: PubMed Central - PubMed

Affiliation: Computing Systems Laboratory (Reservoir Team), Electronics and Information Systems Department (ELIS), Ghent University , Ghent , Belgium.

ABSTRACT
In embodied computation (or morphological computation), part of the complexity of motor control is offloaded to the body dynamics. We demonstrate that a simple Hebbian-like learning rule can be used to train systems with (partial) embodiment, and can be extended outside of the scope of traditional neural networks. To this end, we apply the learning rule to optimize the connection weights of recurrent neural networks with different topologies and for various tasks. We then apply this learning rule to a simulated compliant tensegrity robot by optimizing static feedback controllers that directly exploit the dynamics of the robot body. This leads to partially embodied controllers, i.e., hybrid controllers that naturally integrate the computations that are performed by the robot body into a neural network architecture. Our results demonstrate the universal applicability of reward-modulated Hebbian learning. Furthermore, they demonstrate the robustness of systems trained with the learning rule. This study strengthens our belief that compliant robots should or can be seen as computational units, instead of dumb hardware that needs a complex controller. This link between compliant robotics and neural networks is also the main reason for our search for simple universal learning rules for both neural networks and robotics.

No MeSH data available.


Related in: MedlinePlus

Evolution of the reward for the delayed XOR task, using CMA-ES. The task was simplified by removing all forms of stochasticity.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4538293&req=5

Figure 8: Evolution of the reward for the delayed XOR task, using CMA-ES. The task was simplified by removing all forms of stochasticity.

Mentions: As a second comparison, we evaluated the performance of the covariance matrix adaptation evolution strategy (CMA-ES) (Hansen and Ostermeier, 2001), one of the most popular evolutionary algorithms. The algorithm is used on the 2-bit delayed XOR task. To make the task a bit simpler, we removed all sources of stochasticity (noise, initial neuron state), but apart from this, the setup is completely identical (identical network architecture, same initialization). The objective function to be maximized is the average reward across the four different possible inputs. Figure 8 shows the evolution of the reward during the first 300,000 trials. Comparing this to left panel of Figure 6, we can see that, although the RMH rule is able to find a good solution after 300,000 trials, CMA-ES is still nowhere near converging to a solution. The likely explanation for this is that, because the search space is so huge (about 10,000 dimensions), sample-based approaches like this one require an unfeasible number of samples.


Reward-Modulated Hebbian Plasticity as Leverage for Partially Embodied Control in Compliant Robotics.

Burms J, Caluwaerts K, Dambre J - Front Neurorobot (2015)

Evolution of the reward for the delayed XOR task, using CMA-ES. The task was simplified by removing all forms of stochasticity.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4538293&req=5

Figure 8: Evolution of the reward for the delayed XOR task, using CMA-ES. The task was simplified by removing all forms of stochasticity.
Mentions: As a second comparison, we evaluated the performance of the covariance matrix adaptation evolution strategy (CMA-ES) (Hansen and Ostermeier, 2001), one of the most popular evolutionary algorithms. The algorithm is used on the 2-bit delayed XOR task. To make the task a bit simpler, we removed all sources of stochasticity (noise, initial neuron state), but apart from this, the setup is completely identical (identical network architecture, same initialization). The objective function to be maximized is the average reward across the four different possible inputs. Figure 8 shows the evolution of the reward during the first 300,000 trials. Comparing this to left panel of Figure 6, we can see that, although the RMH rule is able to find a good solution after 300,000 trials, CMA-ES is still nowhere near converging to a solution. The likely explanation for this is that, because the search space is so huge (about 10,000 dimensions), sample-based approaches like this one require an unfeasible number of samples.

Bottom Line: Our results demonstrate the universal applicability of reward-modulated Hebbian learning.Furthermore, they demonstrate the robustness of systems trained with the learning rule.This link between compliant robotics and neural networks is also the main reason for our search for simple universal learning rules for both neural networks and robotics.

View Article: PubMed Central - PubMed

Affiliation: Computing Systems Laboratory (Reservoir Team), Electronics and Information Systems Department (ELIS), Ghent University , Ghent , Belgium.

ABSTRACT
In embodied computation (or morphological computation), part of the complexity of motor control is offloaded to the body dynamics. We demonstrate that a simple Hebbian-like learning rule can be used to train systems with (partial) embodiment, and can be extended outside of the scope of traditional neural networks. To this end, we apply the learning rule to optimize the connection weights of recurrent neural networks with different topologies and for various tasks. We then apply this learning rule to a simulated compliant tensegrity robot by optimizing static feedback controllers that directly exploit the dynamics of the robot body. This leads to partially embodied controllers, i.e., hybrid controllers that naturally integrate the computations that are performed by the robot body into a neural network architecture. Our results demonstrate the universal applicability of reward-modulated Hebbian learning. Furthermore, they demonstrate the robustness of systems trained with the learning rule. This study strengthens our belief that compliant robots should or can be seen as computational units, instead of dumb hardware that needs a complex controller. This link between compliant robotics and neural networks is also the main reason for our search for simple universal learning rules for both neural networks and robotics.

No MeSH data available.


Related in: MedlinePlus