The Effect of UCB Algorithm in Reinforcement Learning

Bibliographic Information

Other Title
  • 強化学習におけるUCB行動選択手法の効果

Description

UCB algorithm was proposed as one of the action choice methods used in a multi-armed bandit problem. In this method, an agent chooses the action by comparing upper bound of confidence intervals of estimated values, thereby it has a better performance than others, like ε-greedy. In this paper, we proposed the method to apply UCB algorithm to Q-learning, and experimentally evaluated its performance by the shortest path problem in the continuous state spaces.

Journal

Details 詳細情報について

  • CRID
    1390001205673422976
  • NII Article ID
    130005480437
  • DOI
    10.14864/fss.30.0_174
  • Text Lang
    ja
  • Data Source
    • JaLC
    • CiNii Articles
  • Abstract License Flag
    Disallowed

Report a problem

Back to top