A New Reward System Based on Human Demonstrations for Hard Exploration Games

Wadhah Tareq; Mehmet Amasyali

doi:10.32604/cmc.2022.020036

Open Access icon Open Access

ARTICLE

A New Reward System Based on Human Demonstrations for Hard Exploration Games

Wadhah Zeyad Tareq^*, Mehmet Fatih Amasyali

Faculty of Electrical and Electronics Engineering, Yildiz Technical University, Istanbul, 34220, Turkey

* Corresponding Author: Wadhah Zeyad Tareq. Email: email

(This article belongs to the Special Issue: Application of Big Data Analytics in the Management of Business)

Computers, Materials & Continua 2022, 70(2), 2401-2414. https://doi.org/10.32604/cmc.2022.020036

Received 06 May 2021; Accepted 11 June 2021; Issue published 27 September 2021

Abstract

The main idea of reinforcement learning is evaluating the chosen action depending on the current reward. According to this concept, many algorithms achieved proper performance on classic Atari 2600 games. The main challenge is when the reward is sparse or missing. Such environments are complex exploration environments like Montezuma’s Revenge, Pitfall, and Private Eye games. Approaches built to deal with such challenges were very demanding. This work introduced a different reward system that enables the simple classical algorithm to learn fast and achieve high performance in hard exploration environments. Moreover, we added some simple enhancements to several hyperparameters, such as the number of actions and the sampling ratio that helped improve performance. We include the extra reward within the human demonstrations. After that, we used Prioritized Double Deep Q-Networks (Prioritized DDQN) to learning from these demonstrations. Our approach enabled the Prioritized DDQN with a short learning time to finish the first level of Montezuma’s Revenge game and to perform well in both Pitfall and Private Eye. We used the same games to compare our results with several baselines, such as the Rainbow and Deep Q-learning from demonstrations (DQfD) algorithm. The results showed that the new rewards system enabled Prioritized DDQN to out-perform the baselines in the hard exploration games with short learning time.

Keywords

Deep reinforcement learning; human demonstrations; prioritized double deep q-networks; atari

Cite This Article

APA Style

Tareq, W.Z., Amasyali, M.F. (2022). A new reward system based on human demonstrations for hard exploration games. Computers, Materials & Continua, 70(2), 2401-2414. https://doi.org/10.32604/cmc.2022.020036

Vancouver Style

Tareq WZ, Amasyali MF. A new reward system based on human demonstrations for hard exploration games. Comput Mater Contin. 2022;70(2):2401-2414 https://doi.org/10.32604/cmc.2022.020036

IEEE Style

W.Z. Tareq and M.F. Amasyali, "A New Reward System Based on Human Demonstrations for Hard Exploration Games," Comput. Mater. Contin., vol. 70, no. 2, pp. 2401-2414. 2022. https://doi.org/10.32604/cmc.2022.020036

BibTex EndNote RIS

This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

A New Reward System Based on Human Demonstrations for Hard Exploration Games

Abstract

Keywords

Cite This Article

2038

1390

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link