将来の状態と行動価値を考慮した内的報酬を利用する強化学習

山田, 智也, 長坂, 保典

type:論文

In this study, we examined a reinforcement learning method to efficiently search and achieve the sufficient learning results under the environment where it is difficult to obtain a reward. We propose a searching method focusing on curiosity as a method for efficiently searching. This is a method of creating values for states and actions that have not been experienced in the past, so that the unknown states can be preferentially searched. The multi-step learning was applied to the Intrinsic Curiosity Module (ICM) in experiment 1. By combining multi-step learning, it is possible to refer to a future state. As a result, it is expected that an unknown state can be easily found during learning. As a result of comparing this method with the existing methods, the sufficient learning results was achieved in short learning time, and the performance was improved by 27%. In experiment 2, Random Network Distillation (RND) which improved the weakness of ICM was used. By adding new functions to RND, we devised a method to search by considering action value. Experimental results show that our new learning method is more efficient, and the performance was improved by 24%. From these results, we concluded that it is effective to consider future states and behavioral values in reinforcement learning.

本研究では、報酬を得るのが難しい環境下で、効率的に探索を行い、目標を達成する強化学習手法を検討した。効率的に探索を行う方法として、好奇心に着目した探索手法を提案する。これは、過去に経験していない状態や行動に対して価値を生み出すことで、未知の状態を優先して探索できる手法である。最初に、Intrinsic Curiosity Module (ICM) に対して、multi-step learning を適用する実験 1 を行った。multi-step learning を組み合わせることで、将来の状態を参照することが可能となる。その結果、学習中に容易に未知の状態を見つけられることが期待できる。既存の手法と比較した結果、短時間の学習で目標を達成した。結果として、性能が 27% 向上した。実験 2 では、ICM の問題点を改善した Random Network Distillation (RND) を使用した。RND に新しい機能を追加して、行動価値を考慮して探索ができる手法を考案した。実験結果から、より効率的に探索を行えることが明らかとなり、性能が 24% 向上した。これらの結果から、強化学習において将来の状態や行動価値を考慮することは有効であるという結論を得た。

将来の状態と行動価値を考慮した内的報酬を利用する強化学習

Bibliographic Information

Abstract

Journal

Keywords

Details 詳細情報について

Export

Report a problem