Research into Using Reinforcement Learning to Reproduce Human Intuition in a Shogi Mating Problem Solver

千葉, 景太, Reijer, Grimbergen, Keita, Chiba

2013年，公式戦で将棋のプロが初めて将棋AIに敗れた．囲碁においても人間より強いものが開発された．これらの理由から完全情報ゲームにおける強いAIの研究は概ね決着がついたと思われる．しかし，人間らしいAIの研究はまだ終わっていない．コンピュータが人間と同じ局面数で，人間と同程度の指し手を見つけるのは困難である．そこで，本研究ではAIによる局面の探索を行わずに指し手の精度を上げることで，人間の直感を再現したAIの作成を目指した．AIはポリシーネットワークに従って指し手を決定し，指し手の正解率を上げるために学習を行った．詰将棋が学習に用いる局面はランダムプレイヤー同士の対局で現れた，詰む局面を用いた．その結果，ランダムプレイヤーを用いて生成した局面であれば60%程度の精度で正解手を指すようになった．しかし，本来の詰将棋の局面に対しては，AIの性能を向上させることはできず，王手のみを抽出したランダムな指し手よりも低い精度で正解手をさすようになった．

In 2013, a shogi professional was beaten for the first time by a shogi AI in an official match. Also in Go, programs that are stronger than the best human players have been developed. Because of this, it can be said that for complete information games, the research into the building of strong AI programs has almost come to an end. However, research into human-like AI has not finished. It is a difficult problem to build an AI that can select similar moves to human expert players while only considering a limited amount of positions. Therefore, our research aims at building an AI that makes good move decisions without doing any search, thereby simulating human intuition. The proposed AI makes move decisions using a trained Policy Network, using reinforcement learning to improve the times a correct move was selected by the network. For learning, mating situations from games between AI playing random moves were used. As a result of the training, in about 60% of these random mating positions, the correct move was selected. However, this did not carry over to proper mating problems, where the proposed method was outperformed by selecting a checking move randomly.

Research into Using Reinforcement Learning to Reproduce Human Intuition in a Shogi Mating Problem Solver

Bibliographic Information

Description

Journal

Keywords

Details 詳細情報について

Export

Report a problem