強化学習法のための状態グルーピングとオポチュニチ評価に関する研究

兪 文偉, 横井 浩史, 嘉数 侑昇

doi:10.1541/ieejeiss1987.117.9_1300

In this paper, we propose the State Grouping scheme for coping with the problem of scaling up the Reinforcement Learning Algorithm to real, large size application. The grouping scheme is based on geographical and trial-error information, and is made up with state generating, state combining, state splitting, state forgetting procedures, with corresponding action selecting module and learning module. Also, we discuss the Labeling Based Evaluation scheme which can evaluate the opportunity of the state-action pair, therefore, use better experience to guide the exploration of the state-space effectively. Incorporating the Labeling Based Evaluation and State Grouping scheme into the Reinforcement Learning Algorithm, we get the approach that can generate organized state space for Reinforcement Learning, and do problem solving as well. We argue that the approach with this kind of ability is necessary for autonomous agent, namely, autonomous agent can not act depending on any pre-defined map, instead, it should search the environment as well as find the optimal problem solution autonomously and simultaneously. By solving the large state-size 3-DOF and 4-link manipulator problem, we show the efficiency of the proposed approach, i.e., the agent can achieve the optimal or sub-optimal path with less memory and less time.

強化学習法のための状態グルーピングとオポチュニチ評価に関する研究

書誌事項

この論文をさがす

抄録

収録刊行物

参考文献 (10)*注記

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

強化学習法のための状態グルーピングとオポチュニチ評価に関する研究

書誌事項

この論文をさがす

抄録

収録刊行物

参考文献 (10)*注記

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

参加プロジェクトリスト

詳細情報詳細情報について