Estimation of Different Reward Functions Latent in Trajectory Data

Saito Masaharu, Arai Sachiyo

doi:10.20965/jaciii.2024.p0403

Estimation of Different Reward Functions Latent in Trajectory Data

DOI Web Site Web Site 8 References

Saito Masaharu

Department of Urban Environment Systems, Graduate School of Science and Engineering, Chiba University
Arai Sachiyo

Department of Urban Environment Systems, Graduate School of Science and Engineering, Chiba University

Search this article

Description

<p>In recent years, inverse reinforcement learning has attracted attention as a method for estimating the intention of actions using the trajectories of various action-taking agents, including human flow data. In the context of reinforcement learning, “intention” refers to a reward function. Conventional inverse reinforcement learning assumes that all trajectories are generated from policies learned under a single reward function. However, it is natural to assume that people in a human flow act according to multiple policies. In this study, we introduce an expectation-maximization algorithm to inverse reinforcement learning, and propose a method to estimate different reward functions from the trajectories of human flow. The effectiveness of the proposed method was evaluated through a computer experiment based on human flow data collected from subjects around airport gates.</p>

Journal

Journal of Advanced Computational Intelligence and Intelligent Informatics

Journal of Advanced Computational Intelligence and Intelligent Informatics 28 (2), 403-412, 2024-03-20

Fuji Technology Press Ltd.

References(8)*help

Keywords

Details 詳細情報について

CRID

1390581003356446208
NII Book ID

AA12042502
DOI

10.20965/jaciii.2024.p0403
ISSN

18838014

13430130
NDL BIB ID

033391235
Web Site

http://id.ndl.go.jp/bib/033391235

https://ndlsearch.ndl.go.jp/books/R000000004-I033391235

https://www.fujipress.jp/main/wp-content/themes/Fujipress/hyosetsu.php?ppno=JACII002800020017
Text Lang

en
Data Source
- JaLC
- NDL
- Crossref
Abstract License Flag
Disallowed

Estimation of Different Reward Functions Latent in Trajectory Data

Search this article

Description

Journal

References(8)*help

Keywords

Details 詳細情報について

Export

Report a problem