報酬の分散を推定するTDアルゴリズムとMean-Variance強化学習法の提案

書誌事項

タイトル別名
  • TD Algorithm for the Variance of Return and Mean-Variance Reinforcement Learning
  • ホウシュウ ノ ブンサン オ スイテイ スル TD アルゴリズム ト Mean Variance キョウカ ガクシュウホウ ノ テイアン

この論文をさがす

抄録

Estimating probability distributions on returns provides various sophisticated decision making schemes for control problems in Markov environments, including risk-sensitive control, efficient exploration of environments and so on. Many reinforcement learning algorithms, however, have simply relied on the expected return. This paper provides a scheme of decision making using mean and variance of returndistributions. This paper presents a TD algorithm for estimating the variance of return in MDP(Markov decision processes) environments and a gradient-based reinforcement learning algorithm on the variance penalized criterion, which is a typical criterion in risk-avoiding control. Empirical results demonstrate behaviors of the algorithms and validates of the criterion for risk-avoiding sequential decision tasks.

収録刊行物

被引用文献 (10)*注記

もっと見る

参考文献 (34)*注記

もっと見る

詳細情報 詳細情報について

問題の指摘

ページトップへ