Evaluation of Soft Actor-Critic in Discrete Action Space

合田, 拓矢, 金子, 知適, Takuya, Aida, Tomoyuki, Kaneko

Bibliographic Information

Other Title

離散行動空間におけるSoft Actor-Critic の評価

Description

Soft Actor-Critic (SAC) [1], [2] は連続行動空間を対象とするState of the Art の強化学習手法であり，そのサンプル効率の高さと学習の頑健さから広く用いられている．本研究では，離散行動空間を対象とする場合，SAC はSoft Q-Learning (SoftQ) [3] と等価な手法となることを示し，実験を通してSoftQを用いる方がSAC の目的関数をナイーブに離散行動空間に適用するよりもサンプル効率が高くなることを示す．また，行動時の方策が学習にどのような影響を与えるかについてや，ソフト行動価値関数を用いることによる特性を実験的に検証する．

Soft Actor-Critic (SAC) [1], [2] is the state-of-the-art reinforcement learning method for continuous action domains and it is widely used because of its high sample-efficiency and robustness. In this research, we show that SAC is equivalent to Soft Q-Learning (SoftQ) [3] in discrete action space and SoftQ performs better than the naive version of SAC for discrete action space in terms of sample-efficiency. Moreover, we evaluate the effect of the choice of behavior policies and the characteristics of using soft action value functions through experiments.

Journal

ゲームプログラミングワークショップ2020論文集

ゲームプログラミングワークショップ2020論文集 2020 175-180, 2020-11-06

情報処理学会

Keywords

Details 詳細情報について

CRID: 1050292572129038976

NII Article ID: 170000184494

Web Site: https://ipsj.ixsq.nii.ac.jp/records/207672

Text Lang: ja

Article Type: conference paper

Data Source

IRDB
CiNii Articles

Export

Report a problem