部分観測マルコフ決定過程下での強化学習 : 確率的傾斜法による接近

書誌事項

タイトル別名
  • Reinforcement Learning in Partially Observable Markov Decision Processes : A Stochastic Gradient Method
  • ブブン カンソク マルコフ ケッテイ カテイカ デ ノ キョウカ ガクシュウ

この論文をさがす

説明

<p>Many conventional works in reinforcement learning are limited to Markov decision processes (MDPs). However, real world decision tasks are essentially non-Markovian. In this paper, we consider reinforcement learning in partially observable MDPs(POMDPs) that is a class of non-Markovian decision problems. In POMDPs assumption, the environment is MDP, but an agent has restricted access to state information. Instead, the agent receives observation containing some information about states of the MDP. Also we focus on a learnig algorithm for memory-less stochastic policies that map the immediate observation of the agent into actions: The memory-less approaches are suited for on-line and real-time adaptive systems that have limited memory and computational resources. Then, the following mathematical results are got. First, it can improve its policy to maximize immediate reward by stochastic gradient ascent without estimating any state or immediate reward. Second, it can improve the policy to maximize discounted reward in an initial state by stochastic gradient ascent without estimating any state, immediate reward or discounted reward. The above advantages are remarkably effective in POMDPs, because it is not required to estimate any states, immediate reward or discounted reward explicitly. Making use of these results, we present an incremental policy improvement algorithm to maximize the average reward in POMDPs. We ensure the rational behavior of the proposed algorithm in a simple experiment.</p>

収録刊行物

  • 人工知能

    人工知能 11 (5), 761-768, 1996-09-01

    一般社団法人 人工知能学会

被引用文献 (59)*注記

もっと見る

参考文献 (13)*注記

もっと見る

詳細情報 詳細情報について

問題の指摘

ページトップへ