Actorに適正度の履歴を用いたActor-Criticアルゴリズム : 不完全なValue-Functionのもとでの強化学習

木村 元, 小林 重信

doi:10.11517/jjsai.15.2_267

書誌事項

タイトル別名

An Analysis of Actor-Critic Algorithms Using Eligibility Traces : Reinforcement Learning with Imperfect Value Functions
Actor ニテキセイドノリレキオモチイタ Actor Critic アルゴリズムフカンゼンナ Value Function ノモトデノキョウカガクシュウ

この論文をさがす

説明

<p>We present an analysis of actor-critic algorithms, in which the actor updates its policy using eligibility traces of the policy parameters. Most of the theoretical results for eligibility traces have been for only critic's value iteration algorithms. This paper investigates what the actor's eligibility trace does. The results show that the algorithm is an extension of Williams' REINFORCE algorithms for infinite horizon reinforcement tasks, and then the critic provides an appropriate reinforcement baseline for the actor. Thanks to the actor's eligibility trace, the actor improves its policy by using a gradient of actual return, not by using a gradient of the estimated return in the critic. It enables the agent to learn a fairly good policy under the condition that the approximated value function in the critic is hopelessly inaccurate for conventional actor-critic algorithms. Also, if an accurate value function is estimated by the critic, the actor's learning is dramatically accelerated in our test cases. The behavior of the algorithm is demonstrated through simulations of a linear quadratic control problem and a pole balancing problem.</p>

収録刊行物

人工知能

人工知能 15 (2), 267-275, 2000-03-01

一般社団法人人工知能学会

キーワード

詳細情報詳細情報について

CRID: 1390848647556017024

NII論文ID: 110002808264

NII書誌ID: AN10067140

ISSN: 09128085; 24358614; 21882266

DOI: 10.11517/jjsai.15.2_267

NDL書誌ID: 5297968

Web Site: http://id.ndl.go.jp/bib/5297968; https://ndlsearch.ndl.go.jp/books/R000000004-I5297968

本文言語コード: ja

データソース種別

JaLC
NDLサーチ
CiNii Articles

抄録ライセンスフラグ: 使用不可

書き出し

問題の指摘

Actorに適正度の履歴を用いたActor-Criticアルゴリズム : 不完全なValue-Functionのもとでの強化学習

書誌事項

この論文をさがす

説明

収録刊行物

被引用文献 (27)*注記

参考文献 (26)*注記

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

Actorに適正度の履歴を用いたActor-Criticアルゴリズム : 不完全なValue-Functionのもとでの強化学習

書誌事項

この論文をさがす

説明

収録刊行物

被引用文献 (27)*注記

参考文献 (26)*注記

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

詳細情報詳細情報について