深層強化学習を用いた麻雀プレイヤの構築

清水, 大志, 田中, 哲朗, Taishi, Shimizu, Tetsuro, Tanaka

本研究では，麻雀で人間の知識をなるべく用いずに人間を超える実力を持つコンピュータプレイヤを作成することを目標とし，そのための第一歩として麻雀を簡略化したすずめ雀を用いて強化学習の効率を高める方法を探求する．すずめ雀は通常の麻雀から手牌や用いる牌の種類を減らし，ルールも単純化したゲームである．多人数ゲームの強化学習を行う場合，single agent の強化学習のように環境として他プレイヤを用意しなくてはいけないが，本研究では，自分の手牌のみを考慮に入れて割引累積報酬和の期待値が最も高い牌を切る一人すずめ雀プレイヤを対戦相手として強化学習を行い，一人すずめ雀プレイヤに迫る強さのプレイヤを作成できた．一方，各局の点数の最大化を目指すのではく，全局を終えたときの平均順位を最小化することを目指して，Super Phoenix で提案されたGlobal Reward Prediction による予測値を報酬に用いる試みを行ったが，平均順位の改善は達成できていない．

In this research, we aim to create a computer player with the ability to surpass human beings. As the first step to that end, we will explore a method to improve the efficiency of reinforcement learning by using a simplified mahjong game, Suzume-Jong. Suzume-Jong is a game that reduces the number of hand tiles and tile types from ordinary mahjong and has a simplified rule. When performing reinforcement learning of a multiplayer game, it is necessary to prepare another player as an environment like the reinforcement learning of a single agent. In this research, as opponent players, we used a Suzume-jong player that selects a move that maximizes the expected value of the sum of the discounted rewards taking only his tiles into account. As a result, we succeeded in creating a player of comparable strength to the opponent’s players. Next, we tried to use the predicted value by Global Reward Prediction proposed by Super Phoenix as a reward, aiming to minimize the average ranking. However, we have not achieved the improvement of the average ranking.

深層強化学習を用いた麻雀プレイヤの構築

書誌事項

説明

収録刊行物

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

深層強化学習を用いた麻雀プレイヤの構築

書誌事項

説明

収録刊行物

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

詳細情報詳細情報について