Limits...
Reinforcement learning for routing in cognitive radio ad hoc networks.

Al-Rawi HA, Yau KL, Mohamad H, Ramli N, Hashim W - ScientificWorldJournal (2014)

Bottom Line: This paper applies RL in routing and investigates the effects of various features of RL (i.e., reward function, exploitation, and exploration, as well as learning rate) through simulation.New approaches and recommendations are proposed to enhance the features in order to improve the network performance brought about by RL to routing.Simulation results show that the RL parameters of the reward function, exploitation, and exploration, as well as learning rate, must be well regulated, and the new approaches proposed in this paper improves SUs' network performance without significantly jeopardizing PUs' network performance, specifically SUs' interference to PUs.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Networked Systems, Sunway University, No. 5 Jalan Universiti, Bandar Sunway, 46150 Petaling Jaya, Selangor, Malaysia.

ABSTRACT
Cognitive radio (CR) enables unlicensed users (or secondary users, SUs) to sense for and exploit underutilized licensed spectrum owned by the licensed users (or primary users, PUs). Reinforcement learning (RL) is an artificial intelligence approach that enables a node to observe, learn, and make appropriate decisions on action selection in order to maximize network performance. Routing enables a source node to search for a least-cost route to its destination node. While there have been increasing efforts to enhance the traditional RL approach for routing in wireless networks, this research area remains largely unexplored in the domain of routing in CR networks. This paper applies RL in routing and investigates the effects of various features of RL (i.e., reward function, exploitation, and exploration, as well as learning rate) through simulation. New approaches and recommendations are proposed to enhance the features in order to improve the network performance brought about by RL to routing. Simulation results show that the RL parameters of the reward function, exploitation, and exploration, as well as learning rate, must be well regulated, and the new approaches proposed in this paper improves SUs' network performance without significantly jeopardizing PUs' network performance, specifically SUs' interference to PUs.

Show MeSH
SUs' interference to PUs for varying PU mean arrival rate μPUL for different levels of standard deviation of PUL σPUL.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4128325&req=5

fig11: SUs' interference to PUs for varying PU mean arrival rate μPUL for different levels of standard deviation of PUL σPUL.

Mentions: When the standard deviation of PUL is low σPUL = 0.2, most next-hop node (or link) and channel pairs have very similar PU mean arrival rate μPUL; however, dynamic softmax outperforms the traditional exploration approaches, specifically up to 39% compared to softmax and up to 22% compared to ε-greedy while achieving very similar network performance with the exploitation-only approach (see Figure 11(a)). When the PU mean arrival rate μPUL becomes higher, all channels have high levels of PUs' activities and Q-values among the channels do not generally vary. So, there is lack of exploration and all approaches achieve very similar network performance. When σPUL = 0.8, the link and channel pairs have the greatest difference in the levels of PU mean arrival rate μPUL. Dynamic softmax outperforms the other exploration approaches, and the exploitation-only approach causes the highest SUs' interference to PUs (see Figure 11(b)). When the standard deviation of PUL σPUL becomes higher, all channels have different levels of PUs' activities, and Q-values among the channels generally vary. So, more explorations are necessary explaining why the exploitation-only approach causes the worst network performance with the highest SUs' interference to PUs. In general, softmax outperforms ε-greedy in most cases because softmax explores lesser below-average routes compared to ε-greedy.


Reinforcement learning for routing in cognitive radio ad hoc networks.

Al-Rawi HA, Yau KL, Mohamad H, Ramli N, Hashim W - ScientificWorldJournal (2014)

SUs' interference to PUs for varying PU mean arrival rate μPUL for different levels of standard deviation of PUL σPUL.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4128325&req=5

fig11: SUs' interference to PUs for varying PU mean arrival rate μPUL for different levels of standard deviation of PUL σPUL.
Mentions: When the standard deviation of PUL is low σPUL = 0.2, most next-hop node (or link) and channel pairs have very similar PU mean arrival rate μPUL; however, dynamic softmax outperforms the traditional exploration approaches, specifically up to 39% compared to softmax and up to 22% compared to ε-greedy while achieving very similar network performance with the exploitation-only approach (see Figure 11(a)). When the PU mean arrival rate μPUL becomes higher, all channels have high levels of PUs' activities and Q-values among the channels do not generally vary. So, there is lack of exploration and all approaches achieve very similar network performance. When σPUL = 0.8, the link and channel pairs have the greatest difference in the levels of PU mean arrival rate μPUL. Dynamic softmax outperforms the other exploration approaches, and the exploitation-only approach causes the highest SUs' interference to PUs (see Figure 11(b)). When the standard deviation of PUL σPUL becomes higher, all channels have different levels of PUs' activities, and Q-values among the channels generally vary. So, more explorations are necessary explaining why the exploitation-only approach causes the worst network performance with the highest SUs' interference to PUs. In general, softmax outperforms ε-greedy in most cases because softmax explores lesser below-average routes compared to ε-greedy.

Bottom Line: This paper applies RL in routing and investigates the effects of various features of RL (i.e., reward function, exploitation, and exploration, as well as learning rate) through simulation.New approaches and recommendations are proposed to enhance the features in order to improve the network performance brought about by RL to routing.Simulation results show that the RL parameters of the reward function, exploitation, and exploration, as well as learning rate, must be well regulated, and the new approaches proposed in this paper improves SUs' network performance without significantly jeopardizing PUs' network performance, specifically SUs' interference to PUs.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Networked Systems, Sunway University, No. 5 Jalan Universiti, Bandar Sunway, 46150 Petaling Jaya, Selangor, Malaysia.

ABSTRACT
Cognitive radio (CR) enables unlicensed users (or secondary users, SUs) to sense for and exploit underutilized licensed spectrum owned by the licensed users (or primary users, PUs). Reinforcement learning (RL) is an artificial intelligence approach that enables a node to observe, learn, and make appropriate decisions on action selection in order to maximize network performance. Routing enables a source node to search for a least-cost route to its destination node. While there have been increasing efforts to enhance the traditional RL approach for routing in wireless networks, this research area remains largely unexplored in the domain of routing in CR networks. This paper applies RL in routing and investigates the effects of various features of RL (i.e., reward function, exploitation, and exploration, as well as learning rate) through simulation. New approaches and recommendations are proposed to enhance the features in order to improve the network performance brought about by RL to routing. Simulation results show that the RL parameters of the reward function, exploitation, and exploration, as well as learning rate, must be well regulated, and the new approaches proposed in this paper improves SUs' network performance without significantly jeopardizing PUs' network performance, specifically SUs' interference to PUs.

Show MeSH