Limits...
Which is the best intrinsic motivation signal for learning multiple skills?

Santucci VG, Baldassarre G, Mirolli M - Front Neurorobot (2013)

Bottom Line: We tested the system in a setup with continuous states and actions, in particular, with a kinematic robotic arm that has to learn different reaching tasks.We compare the results of different versions of the system driven by several different intrinsic motivation signals.The results show (a) that intrinsic reinforcements purely based on the knowledge of the system are not appropriate to guide the acquisition of multiple skills, and (b) that the stronger the link between the IM signal and the competence of the system, the better the performance.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Computational Embodied Neuroscience, Isituto di Scienze e Tecnologie della Cognizione, Consiglio Nazionale delle Ricerche Roma, Italy ; School of Computing and Mathematics, University of Plymouth Plymouth, UK.

ABSTRACT
Humans and other biological agents are able to autonomously learn and cache different skills in the absence of any biological pressure or any assigned task. In this respect, Intrinsic Motivations (i.e., motivations not connected to reward-related stimuli) play a cardinal role in animal learning, and can be considered as a fundamental tool for developing more autonomous and more adaptive artificial agents. In this work, we provide an exhaustive analysis of a scarcely investigated problem: which kind of IM reinforcement signal is the most suitable for driving the acquisition of multiple skills in the shortest time? To this purpose we implemented an artificial agent with a hierarchical architecture that allows to learn and cache different skills. We tested the system in a setup with continuous states and actions, in particular, with a kinematic robotic arm that has to learn different reaching tasks. We compare the results of different versions of the system driven by several different intrinsic motivation signals. The results show (a) that intrinsic reinforcements purely based on the knowledge of the system are not appropriate to guide the acquisition of multiple skills, and (b) that the stronger the link between the IM signal and the competence of the system, the better the performance.

No MeSH data available.


Related in: MedlinePlus

The modular architecture of the system with the controller based on actor-critic experts, the selector and the predictor that generates the IM reinforcement signal driving the selector. n is the number of the tasks; Act A is the output of the actor of the expert, controlling the displacement of the joints of the arm in the next step; Crt A is the evaluation made by the critic of the expert.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3824099&req=5

Figure 2: The modular architecture of the system with the controller based on actor-critic experts, the selector and the predictor that generates the IM reinforcement signal driving the selector. n is the number of the tasks; Act A is the output of the actor of the expert, controlling the displacement of the joints of the arm in the next step; Crt A is the evaluation made by the critic of the expert.

Mentions: Since we are looking for a system able to learn different skills and cache them in its own repertoire of actions, we need an architecture where different abilities are stored in different components of the system (Baldassarre and Mirolli, 2013c). For this reason, the controller of the arm consists in a modular architecture (Figure 2) composed by n experts (8 in this implementation, one for each possible task) and a selector that determines which expert/task will be trained. For simplicity, we coupled each expert to a specific task so that the expert is reinforced only for reaching the associated object, but this assumption does not affect the generality of the results presented here.


Which is the best intrinsic motivation signal for learning multiple skills?

Santucci VG, Baldassarre G, Mirolli M - Front Neurorobot (2013)

The modular architecture of the system with the controller based on actor-critic experts, the selector and the predictor that generates the IM reinforcement signal driving the selector. n is the number of the tasks; Act A is the output of the actor of the expert, controlling the displacement of the joints of the arm in the next step; Crt A is the evaluation made by the critic of the expert.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3824099&req=5

Figure 2: The modular architecture of the system with the controller based on actor-critic experts, the selector and the predictor that generates the IM reinforcement signal driving the selector. n is the number of the tasks; Act A is the output of the actor of the expert, controlling the displacement of the joints of the arm in the next step; Crt A is the evaluation made by the critic of the expert.
Mentions: Since we are looking for a system able to learn different skills and cache them in its own repertoire of actions, we need an architecture where different abilities are stored in different components of the system (Baldassarre and Mirolli, 2013c). For this reason, the controller of the arm consists in a modular architecture (Figure 2) composed by n experts (8 in this implementation, one for each possible task) and a selector that determines which expert/task will be trained. For simplicity, we coupled each expert to a specific task so that the expert is reinforced only for reaching the associated object, but this assumption does not affect the generality of the results presented here.

Bottom Line: We tested the system in a setup with continuous states and actions, in particular, with a kinematic robotic arm that has to learn different reaching tasks.We compare the results of different versions of the system driven by several different intrinsic motivation signals.The results show (a) that intrinsic reinforcements purely based on the knowledge of the system are not appropriate to guide the acquisition of multiple skills, and (b) that the stronger the link between the IM signal and the competence of the system, the better the performance.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Computational Embodied Neuroscience, Isituto di Scienze e Tecnologie della Cognizione, Consiglio Nazionale delle Ricerche Roma, Italy ; School of Computing and Mathematics, University of Plymouth Plymouth, UK.

ABSTRACT
Humans and other biological agents are able to autonomously learn and cache different skills in the absence of any biological pressure or any assigned task. In this respect, Intrinsic Motivations (i.e., motivations not connected to reward-related stimuli) play a cardinal role in animal learning, and can be considered as a fundamental tool for developing more autonomous and more adaptive artificial agents. In this work, we provide an exhaustive analysis of a scarcely investigated problem: which kind of IM reinforcement signal is the most suitable for driving the acquisition of multiple skills in the shortest time? To this purpose we implemented an artificial agent with a hierarchical architecture that allows to learn and cache different skills. We tested the system in a setup with continuous states and actions, in particular, with a kinematic robotic arm that has to learn different reaching tasks. We compare the results of different versions of the system driven by several different intrinsic motivation signals. The results show (a) that intrinsic reinforcements purely based on the knowledge of the system are not appropriate to guide the acquisition of multiple skills, and (b) that the stronger the link between the IM signal and the competence of the system, the better the performance.

No MeSH data available.


Related in: MedlinePlus