Simultaneous learning of situation classification based on rewards and behavior selection based on the situation

説明

This paper describes a system with which a cognitive agent learns the way of abstraction and the policy of behavior selection simultaneously. We call the system situation transition network system (STNS). The system extracts situations and maintains them dynamically in the continuous state space on the basis of rewards from the environment. In this way, the system learns the way of abstraction in a dynamic environment. At the same time, the system stores results of transitions between situations and constructs a network of situations. This network is used for partial planning. At a point of time in the learning process, the system selects a behavior according to the partial plan. Because the planning is performed on a network of the abstracted situations, the agent with STNS does not have to deliberate details in planning. Furthermore, the agent can make a plan even on the early stage of learning because the planning is partial. Owing to the simultaneous learning with task executions the agent can adapt to the current task. The results of computer simulations are given.

収録刊行物

詳細情報 詳細情報について

問題の指摘

ページトップへ