Imitation learning based on entropy-regularized reinforcement learning

UCHIBE Eiji

doi:10.11517/pjsai.jsai2019.0_1i3j203

Bibliographic Information

Other Title

エントロピ正則された強化学習を用いた模倣学習

Abstract

<p>This paper proposes Entropy-Regularized Imitation Learning (ERIL) that is given by a combination of forward and inverse reinforcement learning. ERIL utilizes the soft Bellman optimality equation in which the reward function is augmented by the entropy of the learning policy and the Kullback-Leibler (KL) divergence between the learning and the baseline policies. We show that inverse RL is interpreted as estimating the log-ratio between two policies and the log-ratio is efficiently solved by binary logistic regression. Forward RL is given by a variant of Dynamic Policy Programming and our algorithm is interpreted as minimization of the KL divergence between the learning policy and the estimated expert policy. Experimental results on the MuJoCo-simulated environments show that ERIL is more sample efficient than the previous methods such as GAIL and AIRL because the forward RL step of ERIL is off-policy.</p>

Journal

Proceedings of the Annual Conference of JSAI

Proceedings of the Annual Conference of JSAI JSAI2019 (0), 1I3J203-1I3J203, 2019

The Japanese Society for Artificial Intelligence

Keywords

Details 詳細情報について

CRID: 1390001288143372800

NII Article ID: 130007658282

DOI: 10.11517/pjsai.jsai2019.0_1i3j203

Text Lang: ja

Data Source

JaLC
CiNii Articles

Abstract License Flag: Disallowed

Export