The Effect of UCB Algorithm in Reinforcement Learning

Saito Koki, Notsu Akira, Honda Katsuhiro

doi:10.14864/fss.30.0_174

Bibliographic Information

Other Title

強化学習におけるUCB行動選択手法の効果

Description

UCB algorithm was proposed as one of the action choice methods used in a multi-armed bandit problem. In this method, an agent chooses the action by comparing upper bound of conﬁdence intervals of estimated values, thereby it has a better performance than others, like ε-greedy. In this paper, we proposed the method to apply UCB algorithm to Q-learning, and experimentally evaluated its performance by the shortest path problem in the continuous state spaces.

Journal

Proceedings of the Fuzzy System Symposium

Proceedings of the Fuzzy System Symposium 30 (0), 174-179, 2014

Japan Society for Fuzzy Theory and Intelligent Informatics

Keywords

Details 詳細情報について

CRID: 1390001205673422976

NII Article ID: 130005480437

DOI: 10.14864/fss.30.0_174

Text Lang: ja

Data Source

JaLC
CiNii Articles

Abstract License Flag: Disallowed

Export

Report a problem