Estimation of Different Reward Functions Latent in Trajectory Data

  • Saito Masaharu
    Department of Urban Environment Systems, Graduate School of Science and Engineering, Chiba University
  • Arai Sachiyo
    Department of Urban Environment Systems, Graduate School of Science and Engineering, Chiba University

Search this article

Abstract

<p>In recent years, inverse reinforcement learning has attracted attention as a method for estimating the intention of actions using the trajectories of various action-taking agents, including human flow data. In the context of reinforcement learning, “intention” refers to a reward function. Conventional inverse reinforcement learning assumes that all trajectories are generated from policies learned under a single reward function. However, it is natural to assume that people in a human flow act according to multiple policies. In this study, we introduce an expectation-maximization algorithm to inverse reinforcement learning, and propose a method to estimate different reward functions from the trajectories of human flow. The effectiveness of the proposed method was evaluated through a computer experiment based on human flow data collected from subjects around airport gates.</p>

Journal

References(8)*help

See more

Details 詳細情報について

Report a problem

Back to top