Estimation of Different Reward Functions Latent in Trajectory Data

  • Saito Masaharu
    Department of Urban Environment Systems, Graduate School of Science and Engineering, Chiba University
  • Arai Sachiyo
    Department of Urban Environment Systems, Graduate School of Science and Engineering, Chiba University

この論文をさがす

抄録

<p>In recent years, inverse reinforcement learning has attracted attention as a method for estimating the intention of actions using the trajectories of various action-taking agents, including human flow data. In the context of reinforcement learning, “intention” refers to a reward function. Conventional inverse reinforcement learning assumes that all trajectories are generated from policies learned under a single reward function. However, it is natural to assume that people in a human flow act according to multiple policies. In this study, we introduce an expectation-maximization algorithm to inverse reinforcement learning, and propose a method to estimate different reward functions from the trajectories of human flow. The effectiveness of the proposed method was evaluated through a computer experiment based on human flow data collected from subjects around airport gates.</p>

収録刊行物

参考文献 (8)*注記

もっと見る

詳細情報 詳細情報について

問題の指摘

ページトップへ