-
- Saito Masaharu
- Department of Urban Environment Systems, Graduate School of Science and Engineering, Chiba University
-
- Arai Sachiyo
- Department of Urban Environment Systems, Graduate School of Science and Engineering, Chiba University
この論文をさがす
抄録
<p>In recent years, inverse reinforcement learning has attracted attention as a method for estimating the intention of actions using the trajectories of various action-taking agents, including human flow data. In the context of reinforcement learning, “intention” refers to a reward function. Conventional inverse reinforcement learning assumes that all trajectories are generated from policies learned under a single reward function. However, it is natural to assume that people in a human flow act according to multiple policies. In this study, we introduce an expectation-maximization algorithm to inverse reinforcement learning, and propose a method to estimate different reward functions from the trajectories of human flow. The effectiveness of the proposed method was evaluated through a computer experiment based on human flow data collected from subjects around airport gates.</p>
収録刊行物
-
- Journal of Advanced Computational Intelligence and Intelligent Informatics
-
Journal of Advanced Computational Intelligence and Intelligent Informatics 28 (2), 403-412, 2024-03-20
富士技術出版株式会社
- Tweet
詳細情報 詳細情報について
-
- CRID
- 1390581003356446208
-
- NII書誌ID
- AA12042502
-
- ISSN
- 18838014
- 13430130
-
- NDL書誌ID
- 033391235
-
- 本文言語コード
- en
-
- データソース種別
-
- JaLC
- NDL
- Crossref
-
- 抄録ライセンスフラグ
- 使用不可