オンライン強化学習における満足化と高速化および汎化

  • 片山 晋
    東京工業大学大学院 総合理工学研究科 知能システム科学専攻 リサーチアソシエート

Search this article

Description

<p>Aiming the implementation of paramenter-free reinforcement learning, the following three pieces of research are presented: -Online reinforcement learning to satisfice: In online reinforcement learning in dynamic environments the exploration rate depends on the nature of the environment. By relaxing the criterion from "optimization" to "satisficing", the proposed method assures the convergence for all parameters, coping with unexpected environmental changes. -Efficient implementation of TD(λ): While the naive implementation of TD(λ) with λ>0 ocsts time complexity linear in the number of states, the proposed method implements TD(λ) precisely and fast by computing each value lazily, or by need. Its computation time per each time step is logarithmic in the number of states, while it needs three times the space complexity of the naive implementation. -TD(λ) using Haar basis functions: An algorithm efficiently implementing TD(λ) learning using Haar basis functions is proposed. The algorithm can maintain and update the information of the infinite tree of coefficients in its finitely compressed form. The system of Haar basis functions includes both broad features, which have strong generalization and averaging ability, and narrow features, which have high precision approximation ability. Especially, TD(λ) for Haar basis functions can approximate arbitrary continuous function on [0, 1) in the limit.</p>

Journal

Details 詳細情報について

  • CRID
    1390848647556169984
  • NII Article ID
    110002808366
  • NII Book ID
    AN10067140
  • DOI
    10.11517/jjsai.15.6_998
  • ISSN
    24358614
    21882266
  • Text Lang
    ja
  • Data Source
    • JaLC
    • CiNii Articles
  • Abstract License Flag
    Disallowed

Report a problem

Back to top