Application of reinforcement learning algorithm using policy network and value network\nto the turn-based strategy game

木村, 富宏, Kimura, Tomihiro

ディープマインド社による AlphaGo/AlphaGo Zero/AlphaZero の発表により，ディープニューラルネットワークを使用した学習アルゴリズムが大きく注目を集めゲームアルゴリズムは急速な進歩をとげようとしている．しかしながら，AlphaZero の手法をターン制戦略ゲームに適用するにはゲームに必要なデータをニューラルネットワークで表現するには大きくなりすぎたり複雑になりすぎるといった問題があり，そのまま単純には適用できなかった．また，学習するための棋譜データの蓄積が少ない問題もあった．本研究では行動ユニットの情報をニューラルネットワークの入力側に配置することで出力側のニューラルネットワークの負担を軽減する探索手法を提案し，ポリシーネットワークとバリューネットワークを一つのネットワークに統合することで設計負担を減らし性能を向上させ、自己対戦で学習する AlphaZero の学習アルゴリズムの適用について報告する．作成した AI は学習していない大規模マップでも既存アルゴリズムに対して勝ち越すことができた．

Since the announcement of AlphaGo/AlphaGo Zero/AlphaZero by Deep Mind, the learning algo-rithm using deep neural networks have attracted much attention and the game algorithm is going to advance rapidly. However, the AlphaZero approach could not be implemented in a turn-based strategy game simply because the data required for the game was too large or too complex to be represented by neural networks. There was also a problem that there was little storage of game record data for learning. In this research, we propose a search method that reduces the burden on the neural network on the output side by placing the in-formation of the action unit on the input side of the neural network, and the design burden by integrating the policy network and the value network into one network. This paper reports on the application of AlphaZero’s learning algorithm, which eliminates the need to generate game record data by reducing self-learning and improving performance.

Application of reinforcement learning algorithm using policy network and value network\nto the turn-based strategy game

Bibliographic Information

Description

Journal

Details 詳細情報について

Export

Report a problem