書誌事項
- タイトル別名
-
- TD Algorithm for the Variance of Return and Mean-Variance Reinforcement Learning
- ホウシュウ ノ ブンサン オ スイテイ スル TD アルゴリズム ト Mean Variance キョウカ ガクシュウホウ ノ テイアン
この論文をさがす
抄録
Estimating probability distributions on returns provides various sophisticated decision making schemes for control problems in Markov environments, including risk-sensitive control, efficient exploration of environments and so on. Many reinforcement learning algorithms, however, have simply relied on the expected return. This paper provides a scheme of decision making using mean and variance of returndistributions. This paper presents a TD algorithm for estimating the variance of return in MDP(Markov decision processes) environments and a gradient-based reinforcement learning algorithm on the variance penalized criterion, which is a typical criterion in risk-avoiding control. Empirical results demonstrate behaviors of the algorithms and validates of the criterion for risk-avoiding sequential decision tasks.
収録刊行物
-
- 人工知能学会論文誌
-
人工知能学会論文誌 16 (3), 353-362, 2001
一般社団法人 人工知能学会
- Tweet
キーワード
詳細情報 詳細情報について
-
- CRID
- 1390001205106740224
-
- NII論文ID
- 10015770150
- 30009884223
-
- NII書誌ID
- AA11579226
-
- ISSN
- 13468030
- 13460714
- http://id.crossref.org/issn/09128085
-
- NDL書誌ID
- 5987570
-
- 本文言語コード
- ja
-
- データソース種別
-
- JaLC
- NDL
- Crossref
- CiNii Articles
-
- 抄録ライセンスフラグ
- 使用不可