報酬の分散を推定するTDアルゴリズムとMean-Variance強化学習法の提案

佐藤 誠, 木村 元, 小林 重信

doi:10.1527/tjsai.16.353

書誌事項

タイトル別名

TD Algorithm for the Variance of Return and Mean-Variance Reinforcement Learning
ホウシュウノブンサンオスイテイスル TD アルゴリズムト Mean Variance キョウカガクシュウホウノテイアン

この論文をさがす

抄録

Estimating probability distributions on returns provides various sophisticated decision making schemes for control problems in Markov environments, including risk-sensitive control, efficient exploration of environments and so on. Many reinforcement learning algorithms, however, have simply relied on the expected return. This paper provides a scheme of decision making using mean and variance of returndistributions. This paper presents a TD algorithm for estimating the variance of return in MDP(Markov decision processes) environments and a gradient-based reinforcement learning algorithm on the variance penalized criterion, which is a typical criterion in risk-avoiding control. Empirical results demonstrate behaviors of the algorithms and validates of the criterion for risk-avoiding sequential decision tasks.

収録刊行物

人工知能学会論文誌

人工知能学会論文誌 16 (3), 353-362, 2001

一般社団法人人工知能学会

キーワード

詳細情報詳細情報について

CRID: 1390001205106740224

NII論文ID: 10015770150; 30009884223

NII書誌ID: AA11579226

DOI: 10.1527/tjsai.16.353

ISSN: 13468030; 13460714; http://id.crossref.org/issn/09128085

NDL書誌ID: 5987570

Web Site: https://ndlsearch.ndl.go.jp/books/R000000004-I5987570; https://www.jstage.jst.go.jp/article/tjsai/16/3/16_3_353/_pdf

本文言語コード: ja

データソース種別

JaLC
NDL
Crossref
CiNii Articles

抄録ライセンスフラグ: 使用不可

報酬の分散を推定するTDアルゴリズムとMean-Variance強化学習法の提案

書誌事項

この論文をさがす

抄録

収録刊行物

被引用文献 (10)*注記

参考文献 (34)*注記

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

報酬の分散を推定するTDアルゴリズムとMean-Variance強化学習法の提案

書誌事項

この論文をさがす

抄録

収録刊行物

被引用文献 (10)*注記

参考文献 (34)*注記

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

参加プロジェクトリスト

詳細情報詳細情報について