Limits...
A reward optimization method based on action subrewards in hierarchical reinforcement learning.

Fu Y, Liu Q, Ling X, Cui Z - ScientificWorldJournal (2014)

Bottom Line: Apply it to the online learning in Tetris game, and the experiment result shows that the convergence speed of this algorithm can be enhanced evidently based on the new method which combines hierarchical reinforcement learning algorithm and action subrewards.The "curse of dimensionality" problem is also solved to a certain extent with hierarchical method.All the performance with different parameters is compared and analyzed as well.

View Article: PubMed Central - PubMed

Affiliation: Suzhou Industrial Park Institute of Services Outsourcing, Suzhou, Jiangsu 215123, China ; School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu 215006, China.

ABSTRACT
Reinforcement learning (RL) is one kind of interactive learning methods. Its main characteristics are "trial and error" and "related reward." A hierarchical reinforcement learning method based on action subrewards is proposed to solve the problem of "curse of dimensionality," which means that the states space will grow exponentially in the number of features and low convergence speed. The method can reduce state spaces greatly and choose actions with favorable purpose and efficiency so as to optimize reward function and enhance convergence speed. Apply it to the online learning in Tetris game, and the experiment result shows that the convergence speed of this algorithm can be enhanced evidently based on the new method which combines hierarchical reinforcement learning algorithm and action subrewards. The "curse of dimensionality" problem is also solved to a certain extent with hierarchical method. All the performance with different parameters is compared and analyzed as well.

Show MeSH
The performance comparison between divide and rule policy and nondivide and rule policy.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3926376&req=5

fig3: The performance comparison between divide and rule policy and nondivide and rule policy.

Mentions: The divide and rule policy that adjusts proportion parameters of α, β, and γ is introduced in this experiment to make the learning more flexible, which makes action subrewards more reasonable. And then action subrewards can reflect action quality in time. Figure 3 shows the performance comparison between divide and rule policy and nondivide and rule policy, where exist two situations according to the height of blocks: α = 0.1, β = 0.2, and γ = 0.1 as well as α = 0.2, β = 0.1, and γ = 1; that is, when the height is low, holes could be considered more than height, and height should be considered more with it growing. So policy changes with the environment, which is according to the reality. The result shows that the performance of algorithm improves a lot and scores are much higher.


A reward optimization method based on action subrewards in hierarchical reinforcement learning.

Fu Y, Liu Q, Ling X, Cui Z - ScientificWorldJournal (2014)

The performance comparison between divide and rule policy and nondivide and rule policy.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3926376&req=5

fig3: The performance comparison between divide and rule policy and nondivide and rule policy.
Mentions: The divide and rule policy that adjusts proportion parameters of α, β, and γ is introduced in this experiment to make the learning more flexible, which makes action subrewards more reasonable. And then action subrewards can reflect action quality in time. Figure 3 shows the performance comparison between divide and rule policy and nondivide and rule policy, where exist two situations according to the height of blocks: α = 0.1, β = 0.2, and γ = 0.1 as well as α = 0.2, β = 0.1, and γ = 1; that is, when the height is low, holes could be considered more than height, and height should be considered more with it growing. So policy changes with the environment, which is according to the reality. The result shows that the performance of algorithm improves a lot and scores are much higher.

Bottom Line: Apply it to the online learning in Tetris game, and the experiment result shows that the convergence speed of this algorithm can be enhanced evidently based on the new method which combines hierarchical reinforcement learning algorithm and action subrewards.The "curse of dimensionality" problem is also solved to a certain extent with hierarchical method.All the performance with different parameters is compared and analyzed as well.

View Article: PubMed Central - PubMed

Affiliation: Suzhou Industrial Park Institute of Services Outsourcing, Suzhou, Jiangsu 215123, China ; School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu 215006, China.

ABSTRACT
Reinforcement learning (RL) is one kind of interactive learning methods. Its main characteristics are "trial and error" and "related reward." A hierarchical reinforcement learning method based on action subrewards is proposed to solve the problem of "curse of dimensionality," which means that the states space will grow exponentially in the number of features and low convergence speed. The method can reduce state spaces greatly and choose actions with favorable purpose and efficiency so as to optimize reward function and enhance convergence speed. Apply it to the online learning in Tetris game, and the experiment result shows that the convergence speed of this algorithm can be enhanced evidently based on the new method which combines hierarchical reinforcement learning algorithm and action subrewards. The "curse of dimensionality" problem is also solved to a certain extent with hierarchical method. All the performance with different parameters is compared and analyzed as well.

Show MeSH