Table of Content

Open Access

ARTICLE

A Graph-Based Reinforcement Learning Method with Converged State Exploration and Exploitation

Han Li1, Tianding Chen2, *, Hualiang Teng3, Yingtao Jiang4
College of Mathematics, Physics and Electronic Information Engineering, Wenzhou University, Wenzhou, Zhejiang, 325035, China.
School of Physics and Information Engineering, Minnan Normal University, Zhangzhou, Fujian, 363000, China.
Department of Civil and Environmental Engineering, Howard R. Hughes College of Engineering, University of Nevada, Las Vegas, NV, 454015, USA.
Department of Electrical and Computer Engineering, Howard R. Hughes College of Engineering, University of Nevada, Las Vegas, NV, 454015, USA.
* Corresponding Author: Tianding Chen. Email: .

Computer Modeling in Engineering & Sciences 2019, 118(2), 253-274. https://doi.org/10.31614/cmes.2019.05807

Abstract

In any classical value-based reinforcement learning method, an agent, despite of its continuous interactions with the environment, is yet unable to quickly generate a complete and independent description of the entire environment, leaving the learning method to struggle with a difficult dilemma of choosing between the two tasks, namely exploration and exploitation. This problem becomes more pronounced when the agent has to deal with a dynamic environment, of which the configuration and/or parameters are constantly changing. In this paper, this problem is approached by first mapping a reinforcement learning scheme to a directed graph, and the set that contains all the states already explored shall continue to be exploited in the context of such a graph. We have proved that the two tasks of exploration and exploitation eventually converge in the decision-making process, and thus, there is no need to face the exploration vs. exploitation tradeoff as all the existing reinforcement learning methods do. Rather this observation indicates that a reinforcement learning scheme is essentially the same as searching for the shortest path in a dynamic environment, which is readily tackled by a modified Floyd-Warshall algorithm as proposed in the paper. The experimental results have confirmed that the proposed graph-based reinforcement learning algorithm has significantly higher performance than both standard Q-learning algorithm and improved Q-learning algorithm in solving mazes, rendering it an algorithm of choice in applications involving dynamic environments.

Keywords

Reinforcement learning, graph, exploration and exploitation, maze.

Cite This Article

Li, H., Chen, T., Teng, H., Jiang, Y. (2019). A Graph-Based Reinforcement Learning Method with Converged State Exploration and Exploitation. CMES-Computer Modeling in Engineering & Sciences, 118(2), 253–274.



This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 2955

    View

  • 1770

    Download

  • 0

    Like

Share Link

WeChat scan