部分観測マルコフ決定過程下での強化学習 : 確率的傾斜法による接近

木村 元, 山村 雅幸, 小林 重信

doi:10.11517/jjsai.11.5_761

書誌事項

タイトル別名

Reinforcement Learning in Partially Observable Markov Decision Processes : A Stochastic Gradient Method
ブブンカンソクマルコフケッテイカテイカデノキョウカガクシュウ

この論文をさがす

説明

<p>Many conventional works in reinforcement learning are limited to Markov decision processes (MDPs). However, real world decision tasks are essentially non-Markovian. In this paper, we consider reinforcement learning in partially observable MDPs(POMDPs) that is a class of non-Markovian decision problems. In POMDPs assumption, the environment is MDP, but an agent has restricted access to state information. Instead, the agent receives observation containing some information about states of the MDP. Also we focus on a learnig algorithm for memory-less stochastic policies that map the immediate observation of the agent into actions: The memory-less approaches are suited for on-line and real-time adaptive systems that have limited memory and computational resources. Then, the following mathematical results are got. First, it can improve its policy to maximize immediate reward by stochastic gradient ascent without estimating any state or immediate reward. Second, it can improve the policy to maximize discounted reward in an initial state by stochastic gradient ascent without estimating any state, immediate reward or discounted reward. The above advantages are remarkably effective in POMDPs, because it is not required to estimate any states, immediate reward or discounted reward explicitly. Making use of these results, we present an incremental policy improvement algorithm to maximize the average reward in POMDPs. We ensure the rational behavior of the proposed algorithm in a simple experiment.</p>

収録刊行物

人工知能

人工知能 11 (5), 761-768, 1996-09-01

一般社団法人人工知能学会

キーワード

詳細情報詳細情報について

CRID: 1390004222625459072

NII論文ID: 110002807969

NII書誌ID: AN10067140

ISSN: 09128085; 24358614; 21882266

DOI: 10.11517/jjsai.11.5_761

NDL書誌ID: 4019379

Web Site: http://id.ndl.go.jp/bib/4019379; https://ndlsearch.ndl.go.jp/books/R000000004-I4019379

本文言語コード: ja

データソース種別

JaLC
NDL
CiNii Articles

抄録ライセンスフラグ: 使用不可

部分観測マルコフ決定過程下での強化学習 : 確率的傾斜法による接近

書誌事項

この論文をさがす

説明

収録刊行物

被引用文献 (59)*注記

参考文献 (13)*注記

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

部分観測マルコフ決定過程下での強化学習 : 確率的傾斜法による接近

書誌事項

この論文をさがす

説明

収録刊行物

被引用文献 (59)*注記

参考文献 (13)*注記

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

詳細情報詳細情報について