Offline Model-Based Imitation Learning with Entropy Regularization of Model and Policy

UCHIBE Eiji

doi:10.11517/pjsai.jsai2023.0_2q1os27a02

Bibliographic Information

Other Title

方策とモデルのエントロピ正則を導入したオフラインモデルベース模倣学習

Abstract

<p>Model-Based Entropy-Regularized Imitation Learning (MB-ERIL) is an online model-based generative adversarial imitation learning method that introduces entropy regularization of policy and state transition model. Online-MB-ERIL learns the policy and model from expert data, learner's data, and generated data. Costly interactions with an actual environment are needed to obtain the first two datasets, while the policy and model quickly generate the last one. This report discusses an offline learning setting without using the second data obtained from the interaction between the policy and the actual environment. Next, we propose Offline-MB-ERIL, which introduces the idea of Positive and Unlabeled data learning. Given sub-optimal data, Offline-MB-ERIL can recover policy and model efficiently using them as unlabeled data. Through a vision-based arm-reaching task, we show that Offline-MB-ERIL can better use suboptimal data than Online-MB-ERIL.</p>

Journal

Proceedings of the Annual Conference of JSAI

Proceedings of the Annual Conference of JSAI JSAI2023 (0), 2Q1OS27a02-2Q1OS27a02, 2023

The Japanese Society for Artificial Intelligence

Keywords

Details 詳細情報について

CRID: 1390859758174649088

DOI: 10.11517/pjsai.jsai2023.0_2q1os27a02

ISSN: 27587347

Text Lang: ja

Data Source

JaLC

Abstract License Flag: Disallowed

Export