Limits...
Which is the best intrinsic motivation signal for learning multiple skills?

Santucci VG, Baldassarre G, Mirolli M - Front Neurorobot (2013)

Bottom Line: We tested the system in a setup with continuous states and actions, in particular, with a kinematic robotic arm that has to learn different reaching tasks.We compare the results of different versions of the system driven by several different intrinsic motivation signals.The results show (a) that intrinsic reinforcements purely based on the knowledge of the system are not appropriate to guide the acquisition of multiple skills, and (b) that the stronger the link between the IM signal and the competence of the system, the better the performance.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Computational Embodied Neuroscience, Isituto di Scienze e Tecnologie della Cognizione, Consiglio Nazionale delle Ricerche Roma, Italy ; School of Computing and Mathematics, University of Plymouth Plymouth, UK.

ABSTRACT
Humans and other biological agents are able to autonomously learn and cache different skills in the absence of any biological pressure or any assigned task. In this respect, Intrinsic Motivations (i.e., motivations not connected to reward-related stimuli) play a cardinal role in animal learning, and can be considered as a fundamental tool for developing more autonomous and more adaptive artificial agents. In this work, we provide an exhaustive analysis of a scarcely investigated problem: which kind of IM reinforcement signal is the most suitable for driving the acquisition of multiple skills in the shortest time? To this purpose we implemented an artificial agent with a hierarchical architecture that allows to learn and cache different skills. We tested the system in a setup with continuous states and actions, in particular, with a kinematic robotic arm that has to learn different reaching tasks. We compare the results of different versions of the system driven by several different intrinsic motivation signals. The results show (a) that intrinsic reinforcements purely based on the knowledge of the system are not appropriate to guide the acquisition of multiple skills, and (b) that the stronger the link between the IM signal and the competence of the system, the better the performance.

No MeSH data available.


Related in: MedlinePlus

Average number of trials needed by the system to achieve a performance in the 4 learnable tasks of 95% with different values of the learning rates of the predictors (average on 20 replications per learning rate). If a system has not reached 95% we report above the corresponding bar the average performance at the end of the simulation.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3824099&req=5

Figure 5: Average number of trials needed by the system to achieve a performance in the 4 learnable tasks of 95% with different values of the learning rates of the predictors (average on 20 replications per learning rate). If a system has not reached 95% we report above the corresponding bar the average performance at the end of the simulation.

Mentions: In Figure 5 we show a detailed analysis of the average performance of the system in the different conditions with different values of the learning rate for the predictors. SAP-PE and SP-PE are not able, regardless from the learning rate of the predictor, to achieve the target performance, while SAP-TD-PE and SP-TD-PE seems to be sensitive to the value of the learning rate of the predictor (SP-TD-PE more than SAP-TD-PE). Differently, TP-PE is very robust with respect to the value of the learning rate of the predictor: regardless of this value this condition is always the best performer, being able to achieve a high performance in a short time.


Which is the best intrinsic motivation signal for learning multiple skills?

Santucci VG, Baldassarre G, Mirolli M - Front Neurorobot (2013)

Average number of trials needed by the system to achieve a performance in the 4 learnable tasks of 95% with different values of the learning rates of the predictors (average on 20 replications per learning rate). If a system has not reached 95% we report above the corresponding bar the average performance at the end of the simulation.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3824099&req=5

Figure 5: Average number of trials needed by the system to achieve a performance in the 4 learnable tasks of 95% with different values of the learning rates of the predictors (average on 20 replications per learning rate). If a system has not reached 95% we report above the corresponding bar the average performance at the end of the simulation.
Mentions: In Figure 5 we show a detailed analysis of the average performance of the system in the different conditions with different values of the learning rate for the predictors. SAP-PE and SP-PE are not able, regardless from the learning rate of the predictor, to achieve the target performance, while SAP-TD-PE and SP-TD-PE seems to be sensitive to the value of the learning rate of the predictor (SP-TD-PE more than SAP-TD-PE). Differently, TP-PE is very robust with respect to the value of the learning rate of the predictor: regardless of this value this condition is always the best performer, being able to achieve a high performance in a short time.

Bottom Line: We tested the system in a setup with continuous states and actions, in particular, with a kinematic robotic arm that has to learn different reaching tasks.We compare the results of different versions of the system driven by several different intrinsic motivation signals.The results show (a) that intrinsic reinforcements purely based on the knowledge of the system are not appropriate to guide the acquisition of multiple skills, and (b) that the stronger the link between the IM signal and the competence of the system, the better the performance.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Computational Embodied Neuroscience, Isituto di Scienze e Tecnologie della Cognizione, Consiglio Nazionale delle Ricerche Roma, Italy ; School of Computing and Mathematics, University of Plymouth Plymouth, UK.

ABSTRACT
Humans and other biological agents are able to autonomously learn and cache different skills in the absence of any biological pressure or any assigned task. In this respect, Intrinsic Motivations (i.e., motivations not connected to reward-related stimuli) play a cardinal role in animal learning, and can be considered as a fundamental tool for developing more autonomous and more adaptive artificial agents. In this work, we provide an exhaustive analysis of a scarcely investigated problem: which kind of IM reinforcement signal is the most suitable for driving the acquisition of multiple skills in the shortest time? To this purpose we implemented an artificial agent with a hierarchical architecture that allows to learn and cache different skills. We tested the system in a setup with continuous states and actions, in particular, with a kinematic robotic arm that has to learn different reaching tasks. We compare the results of different versions of the system driven by several different intrinsic motivation signals. The results show (a) that intrinsic reinforcements purely based on the knowledge of the system are not appropriate to guide the acquisition of multiple skills, and (b) that the stronger the link between the IM signal and the competence of the system, the better the performance.

No MeSH data available.


Related in: MedlinePlus