Performance of LQ-learning in POMDP Environments

  • Lee Haeyeon
    Dept. of Elec. & Comm. Eng., School of Eng., Tohoku Univ.
  • Kamaya Hiroyuki
    Dept. of Elec. Eng., Hachinohe National College of Technology
  • Abe Kenich
    Dept. of Elec. & Comm. Eng., School of Eng., Tohoku Univ.

説明

In this paper, we propose a new type of LQ-learning to solve POMDP. In the POMDP environment, the agent cannot observe the environment directly. In the LQ-learning, in order to dicriminate partially observed states, the agent attaches label to each observation which perceived as the same ones. Unlike our previous LQ-learning, we make preparations of knowledge about the environment in advance. The knowledge is automatically acquired by Kohenen’s Self-Organizing Map (SOM), which provides the knowledge about state transitions to the agent. Then, LQ-learning agent attaches labels to observations with reference to a map obtained by SOM.

収録刊行物

詳細情報 詳細情報について

  • CRID
    1390282680561053568
  • NII論文ID
    130006960136
  • DOI
    10.11499/sicep.2002.0.174.0
  • 本文言語コード
    en
  • データソース種別
    • JaLC
    • CiNii Articles
  • 抄録ライセンスフラグ
    使用不可

問題の指摘

ページトップへ