Limits...
Which is the best intrinsic motivation signal for learning multiple skills?

Santucci VG, Baldassarre G, Mirolli M - Front Neurorobot (2013)

Bottom Line: We tested the system in a setup with continuous states and actions, in particular, with a kinematic robotic arm that has to learn different reaching tasks.We compare the results of different versions of the system driven by several different intrinsic motivation signals.The results show (a) that intrinsic reinforcements purely based on the knowledge of the system are not appropriate to guide the acquisition of multiple skills, and (b) that the stronger the link between the IM signal and the competence of the system, the better the performance.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Computational Embodied Neuroscience, Isituto di Scienze e Tecnologie della Cognizione, Consiglio Nazionale delle Ricerche Roma, Italy ; School of Computing and Mathematics, University of Plymouth Plymouth, UK.

ABSTRACT
Humans and other biological agents are able to autonomously learn and cache different skills in the absence of any biological pressure or any assigned task. In this respect, Intrinsic Motivations (i.e., motivations not connected to reward-related stimuli) play a cardinal role in animal learning, and can be considered as a fundamental tool for developing more autonomous and more adaptive artificial agents. In this work, we provide an exhaustive analysis of a scarcely investigated problem: which kind of IM reinforcement signal is the most suitable for driving the acquisition of multiple skills in the shortest time? To this purpose we implemented an artificial agent with a hierarchical architecture that allows to learn and cache different skills. We tested the system in a setup with continuous states and actions, in particular, with a kinematic robotic arm that has to learn different reaching tasks. We compare the results of different versions of the system driven by several different intrinsic motivation signals. The results show (a) that intrinsic reinforcements purely based on the knowledge of the system are not appropriate to guide the acquisition of multiple skills, and (b) that the stronger the link between the IM signal and the competence of the system, the better the performance.

No MeSH data available.


Related in: MedlinePlus

Top: Average performance on the 4 learnable tasks in the best condition (with respect to the learning rate of the predictor) of KB-PE, SAP-PE, SP-PE. Bottom: Average selection probability for the experts associated to the 4 learnable tasks, in the same condition.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3824099&req=5

Figure 7: Top: Average performance on the 4 learnable tasks in the best condition (with respect to the learning rate of the predictor) of KB-PE, SAP-PE, SP-PE. Bottom: Average selection probability for the experts associated to the 4 learnable tasks, in the same condition.

Mentions: Figure 7 (left) shows the results of the KB-PE mechanisms. In this condition the system is not driven by an IM signal connected to the competence of the system in learning the skills, but to the knowledge acquired by the predictor in anticipating every possible future state. For this reason the system is not selecting the experts connected to the tasks that are still to be learnt, but rather the experts that are surprising the predictor reaching whatever unpredicted state. These experts include also those related to the 4 non-learnable tasks. This process leads to a random selection (random-selection value is 0.125 because it is calculated on all the 8 tasks). While this is not a problem for the two easiest tasks (task 1 and task 3) that are learnt after few trials, the canceling of the IM signal and the consequent absence of a focused learning severely impairs the learning of the difficult tasks (task 2 and task 4).


Which is the best intrinsic motivation signal for learning multiple skills?

Santucci VG, Baldassarre G, Mirolli M - Front Neurorobot (2013)

Top: Average performance on the 4 learnable tasks in the best condition (with respect to the learning rate of the predictor) of KB-PE, SAP-PE, SP-PE. Bottom: Average selection probability for the experts associated to the 4 learnable tasks, in the same condition.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3824099&req=5

Figure 7: Top: Average performance on the 4 learnable tasks in the best condition (with respect to the learning rate of the predictor) of KB-PE, SAP-PE, SP-PE. Bottom: Average selection probability for the experts associated to the 4 learnable tasks, in the same condition.
Mentions: Figure 7 (left) shows the results of the KB-PE mechanisms. In this condition the system is not driven by an IM signal connected to the competence of the system in learning the skills, but to the knowledge acquired by the predictor in anticipating every possible future state. For this reason the system is not selecting the experts connected to the tasks that are still to be learnt, but rather the experts that are surprising the predictor reaching whatever unpredicted state. These experts include also those related to the 4 non-learnable tasks. This process leads to a random selection (random-selection value is 0.125 because it is calculated on all the 8 tasks). While this is not a problem for the two easiest tasks (task 1 and task 3) that are learnt after few trials, the canceling of the IM signal and the consequent absence of a focused learning severely impairs the learning of the difficult tasks (task 2 and task 4).

Bottom Line: We tested the system in a setup with continuous states and actions, in particular, with a kinematic robotic arm that has to learn different reaching tasks.We compare the results of different versions of the system driven by several different intrinsic motivation signals.The results show (a) that intrinsic reinforcements purely based on the knowledge of the system are not appropriate to guide the acquisition of multiple skills, and (b) that the stronger the link between the IM signal and the competence of the system, the better the performance.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Computational Embodied Neuroscience, Isituto di Scienze e Tecnologie della Cognizione, Consiglio Nazionale delle Ricerche Roma, Italy ; School of Computing and Mathematics, University of Plymouth Plymouth, UK.

ABSTRACT
Humans and other biological agents are able to autonomously learn and cache different skills in the absence of any biological pressure or any assigned task. In this respect, Intrinsic Motivations (i.e., motivations not connected to reward-related stimuli) play a cardinal role in animal learning, and can be considered as a fundamental tool for developing more autonomous and more adaptive artificial agents. In this work, we provide an exhaustive analysis of a scarcely investigated problem: which kind of IM reinforcement signal is the most suitable for driving the acquisition of multiple skills in the shortest time? To this purpose we implemented an artificial agent with a hierarchical architecture that allows to learn and cache different skills. We tested the system in a setup with continuous states and actions, in particular, with a kinematic robotic arm that has to learn different reaching tasks. We compare the results of different versions of the system driven by several different intrinsic motivation signals. The results show (a) that intrinsic reinforcements purely based on the knowledge of the system are not appropriate to guide the acquisition of multiple skills, and (b) that the stronger the link between the IM signal and the competence of the system, the better the performance.

No MeSH data available.


Related in: MedlinePlus