麻雀のポリシー関数に適したネットワークモデルの構築と評価

清水, 大志, 田中, 哲朗

入力にゲーム固有の特徴量をほとんど用いずに自己対戦による強化学習のみで，AlphaGo Zero は囲碁のトッププレイヤを大きく超える強さを達成した．この成功を受け，他のゲームにおいてもゲーム固有の特徴量をなるべく入力に使わないニューラルネットを強化学習により学習させて強いプレイヤを作成する試みが行われている．強化学習を用いた自己対戦には大量の計算機を使った実験が必要になるが，本研究ではあるゲームにおいて強化学習をさせる前に，事前にそのゲームの性質を持つ小さいゲームを教師あり学習で学習させて，適したネットワークモデルを求める方法を提案する．小さいゲームに対する教師あり学習は短い時間で終了するため，ハイパーパラメータ自動最適化ツールを用いて様々なネットワークモデルの中から適したモデルを選択することが可能である．本研究では，麻雀のゲームとしての特徴を保持しつつ，理論的な最善手が求められるミニゲームを対象として教師あり学習により，ニューラルネットワークのモデルを評価した．提案したモデルは先行研究のモデルよりも高い正解率が得られた．高評価を得たモデルに対して強化学習を適用したが，得られた正解率は低かった．

With the success of AlphaGo Zero, which achieved strength far exceeding the top players of Go by using only the reinforcement learning by self-training using almost no game-speciﬁc features for input, many people have been attempting to create strong players by learning a neural network that does not use game-speciﬁc features for input as much as possible. Reinforcement learning with self-training requires ex-periments using a large amount of computation. We propose a method for ﬁnding a suitable network model, which is learned by using a small game with the characteristics of the game with supervised learning before using reinforcement learning. Since supervised learning for a small game can be completed in a short time, we can select a suitable model from various network models using hyperparameter automatic optimization tools. In this study, we evaluated the neural network model by supervised learning for mini-games that require the best of the theory while retaining the characteristics of mahjong. We call this variation of games as mini-mahjong. The proposed model achieves higher accuracy than the models previously proposed. This highly evaluated model applied to reinforcement learning, but the accuracy was low.

麻雀のポリシー関数に適したネットワークモデルの構築と評価

書誌事項

抄録

収録刊行物

詳細情報詳細情報について

書き出し

問題の指摘

麻雀のポリシー関数に適したネットワークモデルの構築と評価

書誌事項

抄録

収録刊行物

詳細情報 詳細情報について

書き出し

問題の指摘

参加プロジェクトリスト

詳細情報詳細情報について