Limits...
Reinforcement learning for routing in cognitive radio ad hoc networks.

Al-Rawi HA, Yau KL, Mohamad H, Ramli N, Hashim W - ScientificWorldJournal (2014)

Bottom Line: This paper applies RL in routing and investigates the effects of various features of RL (i.e., reward function, exploitation, and exploration, as well as learning rate) through simulation.New approaches and recommendations are proposed to enhance the features in order to improve the network performance brought about by RL to routing.Simulation results show that the RL parameters of the reward function, exploitation, and exploration, as well as learning rate, must be well regulated, and the new approaches proposed in this paper improves SUs' network performance without significantly jeopardizing PUs' network performance, specifically SUs' interference to PUs.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Networked Systems, Sunway University, No. 5 Jalan Universiti, Bandar Sunway, 46150 Petaling Jaya, Selangor, Malaysia.

ABSTRACT
Cognitive radio (CR) enables unlicensed users (or secondary users, SUs) to sense for and exploit underutilized licensed spectrum owned by the licensed users (or primary users, PUs). Reinforcement learning (RL) is an artificial intelligence approach that enables a node to observe, learn, and make appropriate decisions on action selection in order to maximize network performance. Routing enables a source node to search for a least-cost route to its destination node. While there have been increasing efforts to enhance the traditional RL approach for routing in wireless networks, this research area remains largely unexplored in the domain of routing in CR networks. This paper applies RL in routing and investigates the effects of various features of RL (i.e., reward function, exploitation, and exploration, as well as learning rate) through simulation. New approaches and recommendations are proposed to enhance the features in order to improve the network performance brought about by RL to routing. Simulation results show that the RL parameters of the reward function, exploitation, and exploration, as well as learning rate, must be well regulated, and the new approaches proposed in this paper improves SUs' network performance without significantly jeopardizing PUs' network performance, specifically SUs' interference to PUs.

Show MeSH
Counterapproach algorithm at SU node i.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4128325&req=5

alg2: Counterapproach algorithm at SU node i.

Mentions: The proposed counterapproach addresses the aforementioned issue by regulating the learning rate based on the historical Q-values. Specifically, for each Q-value (or state-action pair), it keeps track of a counter for winning ctp and another counter for losing ctn (see Algorithm 2). Subsequently, it calculates a ratio otn = ctp/ctn, which represents the number of winning to the number of losing for a Q-value. When the ratio is otn > 1, the learning rate is set to a lower value because this indicates a winning event, while the learning rate is set to a higher value when the ratio is otn < 1 because this indicates a losing event, and when the ratio is otn = 1, the learning rate is unchanged as it indicates a stable network performance. Similarly, when the newly received Q-value is similar to the previous value, the counter is not updated. Note that we assume the learning rate α is increased and decreased by a small constant factor  f in order to avoid fluctuations in SUs' network performance.


Reinforcement learning for routing in cognitive radio ad hoc networks.

Al-Rawi HA, Yau KL, Mohamad H, Ramli N, Hashim W - ScientificWorldJournal (2014)

Counterapproach algorithm at SU node i.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4128325&req=5

alg2: Counterapproach algorithm at SU node i.
Mentions: The proposed counterapproach addresses the aforementioned issue by regulating the learning rate based on the historical Q-values. Specifically, for each Q-value (or state-action pair), it keeps track of a counter for winning ctp and another counter for losing ctn (see Algorithm 2). Subsequently, it calculates a ratio otn = ctp/ctn, which represents the number of winning to the number of losing for a Q-value. When the ratio is otn > 1, the learning rate is set to a lower value because this indicates a winning event, while the learning rate is set to a higher value when the ratio is otn < 1 because this indicates a losing event, and when the ratio is otn = 1, the learning rate is unchanged as it indicates a stable network performance. Similarly, when the newly received Q-value is similar to the previous value, the counter is not updated. Note that we assume the learning rate α is increased and decreased by a small constant factor  f in order to avoid fluctuations in SUs' network performance.

Bottom Line: This paper applies RL in routing and investigates the effects of various features of RL (i.e., reward function, exploitation, and exploration, as well as learning rate) through simulation.New approaches and recommendations are proposed to enhance the features in order to improve the network performance brought about by RL to routing.Simulation results show that the RL parameters of the reward function, exploitation, and exploration, as well as learning rate, must be well regulated, and the new approaches proposed in this paper improves SUs' network performance without significantly jeopardizing PUs' network performance, specifically SUs' interference to PUs.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Networked Systems, Sunway University, No. 5 Jalan Universiti, Bandar Sunway, 46150 Petaling Jaya, Selangor, Malaysia.

ABSTRACT
Cognitive radio (CR) enables unlicensed users (or secondary users, SUs) to sense for and exploit underutilized licensed spectrum owned by the licensed users (or primary users, PUs). Reinforcement learning (RL) is an artificial intelligence approach that enables a node to observe, learn, and make appropriate decisions on action selection in order to maximize network performance. Routing enables a source node to search for a least-cost route to its destination node. While there have been increasing efforts to enhance the traditional RL approach for routing in wireless networks, this research area remains largely unexplored in the domain of routing in CR networks. This paper applies RL in routing and investigates the effects of various features of RL (i.e., reward function, exploitation, and exploration, as well as learning rate) through simulation. New approaches and recommendations are proposed to enhance the features in order to improve the network performance brought about by RL to routing. Simulation results show that the RL parameters of the reward function, exploitation, and exploration, as well as learning rate, must be well regulated, and the new approaches proposed in this paper improves SUs' network performance without significantly jeopardizing PUs' network performance, specifically SUs' interference to PUs.

Show MeSH