オンライン強化学習における満足化と高速化および汎化

片山 晋

doi:10.11517/jjsai.15.6_998

<p>Aiming the implementation of paramenter-free reinforcement learning, the following three pieces of research are presented: -Online reinforcement learning to satisfice: In online reinforcement learning in dynamic environments the exploration rate depends on the nature of the environment. By relaxing the criterion from "optimization" to "satisficing", the proposed method assures the convergence for all parameters, coping with unexpected environmental changes. -Efficient implementation of TD(λ): While the naive implementation of TD(λ) with λ>0 ocsts time complexity linear in the number of states, the proposed method implements TD(λ) precisely and fast by computing each value lazily, or by need. Its computation time per each time step is logarithmic in the number of states, while it needs three times the space complexity of the naive implementation. -TD(λ) using Haar basis functions: An algorithm efficiently implementing TD(λ) learning using Haar basis functions is proposed. The algorithm can maintain and update the information of the infinite tree of coefficients in its finitely compressed form. The system of Haar basis functions includes both broad features, which have strong generalization and averaging ability, and narrow features, which have high precision approximation ability. Especially, TD(λ) for Haar basis functions can approximate arbitrary continuous function on [0, 1) in the limit.</p>

オンライン強化学習における満足化と高速化および汎化

Search this article

Description

Journal

Keywords

Details 詳細情報について

Export

Report a problem