Limits...
Intelligent control of a sensor-actuator system via kernelized least-squares policy iteration.

Liu B, Chen S, Li S, Liang Y - Sensors (Basel) (2012)

Bottom Line: Random Projections are a fast, non-adaptive dimensionality reduction framework in which high-dimensionality data is projected onto a random lower-dimension subspace via spherically random rotation and coordination sampling.Theoretical foundation underlying this approach is a fast approximation of Singular Value Decomposition (SVD).Finally, simulation results are exhibited on benchmark MDP domains, which confirm gains both in computation time and in performance in large feature spaces.

View Article: PubMed Central - PubMed

Affiliation: Key Laboratory of Visual Media Processing and Transmission, Shenzhen Institute of Information Technology, Shenzhen, Guangdong 518029, China. boliu@cs.umass.edu

ABSTRACT
In this paper a new framework, called Compressive Kernelized Reinforcement Learning (CKRL), for computing near-optimal policies in sequential decision making with uncertainty is proposed via incorporating the non-adaptive data-independent Random Projections and nonparametric Kernelized Least-squares Policy Iteration (KLSPI). Random Projections are a fast, non-adaptive dimensionality reduction framework in which high-dimensionality data is projected onto a random lower-dimension subspace via spherically random rotation and coordination sampling. KLSPI introduce kernel trick into the LSPI framework for Reinforcement Learning, often achieving faster convergence and providing automatic feature selection via various kernel sparsification approaches. In this approach, policies are computed in a low-dimensional subspace generated by projecting the high-dimensional features onto a set of random basis. We first show how Random Projections constitute an efficient sparsification technique and how our method often converges faster than regular LSPI, while at lower computational costs. Theoretical foundation underlying this approach is a fast approximation of Singular Value Decomposition (SVD). Finally, simulation results are exhibited on benchmark MDP domains, which confirm gains both in computation time and in performance in large feature spaces.

No MeSH data available.


A Typical Trajectory of a Successful Swing-up of Acrobot.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3376585&req=5

f4-sensors-12-02632: A Typical Trajectory of a Successful Swing-up of Acrobot.

Mentions: The Acrobot is an under-actuated double pendulum which is a typical working benchmark for its nonlinear dynamics. It consists of two arms where torque can only be applied at the second joint. The system is described by four continuous variables: the two joint angles, θ1 and θ2, and the angular velocities, θ̇1 and θ̇2. There are three actions corresponding to positive (a = 1), negative (a = −1), and zero (a = 0) torque. The time step was set to 0.05 and actions were selected after every fourth update to the state variables according to the equations of motion. The goal for this domain is to raise the tip of the second link above a certain height in minimum time (we used a height of 1, where both links have a length of 1). The reward function is therefore −1 for each time step until the goal is achieved and the discount factor is γ = 0.99. Episodes begin with the all state variables at value 0 which corresponds to the two links hanging straight down and motionless. Figure 3 is an illustration figure of Acrobot. Figure 4 shows a snapshot of a successful run.


Intelligent control of a sensor-actuator system via kernelized least-squares policy iteration.

Liu B, Chen S, Li S, Liang Y - Sensors (Basel) (2012)

A Typical Trajectory of a Successful Swing-up of Acrobot.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3376585&req=5

f4-sensors-12-02632: A Typical Trajectory of a Successful Swing-up of Acrobot.
Mentions: The Acrobot is an under-actuated double pendulum which is a typical working benchmark for its nonlinear dynamics. It consists of two arms where torque can only be applied at the second joint. The system is described by four continuous variables: the two joint angles, θ1 and θ2, and the angular velocities, θ̇1 and θ̇2. There are three actions corresponding to positive (a = 1), negative (a = −1), and zero (a = 0) torque. The time step was set to 0.05 and actions were selected after every fourth update to the state variables according to the equations of motion. The goal for this domain is to raise the tip of the second link above a certain height in minimum time (we used a height of 1, where both links have a length of 1). The reward function is therefore −1 for each time step until the goal is achieved and the discount factor is γ = 0.99. Episodes begin with the all state variables at value 0 which corresponds to the two links hanging straight down and motionless. Figure 3 is an illustration figure of Acrobot. Figure 4 shows a snapshot of a successful run.

Bottom Line: Random Projections are a fast, non-adaptive dimensionality reduction framework in which high-dimensionality data is projected onto a random lower-dimension subspace via spherically random rotation and coordination sampling.Theoretical foundation underlying this approach is a fast approximation of Singular Value Decomposition (SVD).Finally, simulation results are exhibited on benchmark MDP domains, which confirm gains both in computation time and in performance in large feature spaces.

View Article: PubMed Central - PubMed

Affiliation: Key Laboratory of Visual Media Processing and Transmission, Shenzhen Institute of Information Technology, Shenzhen, Guangdong 518029, China. boliu@cs.umass.edu

ABSTRACT
In this paper a new framework, called Compressive Kernelized Reinforcement Learning (CKRL), for computing near-optimal policies in sequential decision making with uncertainty is proposed via incorporating the non-adaptive data-independent Random Projections and nonparametric Kernelized Least-squares Policy Iteration (KLSPI). Random Projections are a fast, non-adaptive dimensionality reduction framework in which high-dimensionality data is projected onto a random lower-dimension subspace via spherically random rotation and coordination sampling. KLSPI introduce kernel trick into the LSPI framework for Reinforcement Learning, often achieving faster convergence and providing automatic feature selection via various kernel sparsification approaches. In this approach, policies are computed in a low-dimensional subspace generated by projecting the high-dimensional features onto a set of random basis. We first show how Random Projections constitute an efficient sparsification technique and how our method often converges faster than regular LSPI, while at lower computational costs. Theoretical foundation underlying this approach is a fast approximation of Singular Value Decomposition (SVD). Finally, simulation results are exhibited on benchmark MDP domains, which confirm gains both in computation time and in performance in large feature spaces.

No MeSH data available.