An Actor-Critic Algorithm Using a Binary Tree Action Selector

  • KIMURA Hajime
    Interdisciplinary Graduate School of Sci. and Eng. Tokyo Institute of Technology
  • KOBAYASHI Shigenobu
    Interdisciplinary Graduate School of Sci. and Eng. Tokyo Institute of Technology

Bibliographic Information

Other Title
  • 確率的2分木の行動選択を用いたActor-Criticアルゴリズム
  • 確率的2分木の行動選択を用いたActor-Criticアルゴリズム:多数の行動を扱う強化学習
  • カクリツテキ 2ブンギ ノ コウドウ センタク オ モチイタ Actor Critic アルゴリズム タスウ ノ コウドウ オ アツカウ キョウカ ガクシュウ
  • Reinforcement Learning to Cope with Enormous Actions
  • 多数の行動を扱う強化学習

Search this article

Abstract

In real world applications, learning algorithms often have to handle several dozens of actions, which have some distance metrics. Epsilon-greedy or Boltzmann distribution exploration strategies, which have been applied for Q-learning or SARSA, are very popular, simple and effective in the problems that have a few actions, however, the efficiency would decrease when the number of actions is increased. We propose a policy function representation that consists of a stochastic binary decision tree, and we apply it to an actor-critic algorithm for the problems that have enormous similar actions. Simulation results show the increase of the actions does not affect learning curves of the proposed method at all.

Journal

Citations (3)*help

See more

References(12)*help

See more

Details 詳細情報について

Report a problem

Back to top