対局に基づいた教師データの重要度の学習

佐藤, 佳州, 高橋, 大介, Yoshikuni, Sato, Daisuke, Takahashi

近年，ゲームプログラミングの分野では機械学習が大きな注目を集めており，評価関数，探索深さ，モンテカルロ木探索のplayoutの方策等，多くのパラメータの学習で成功を収めている．現在のゲームプログラミングにおける機械学習では，人間のエキスパートの棋譜を教師として，その指し手に近づけるようにパラメータの調整を行っている．しかし，将棋等のゲームでは，コンピュータはすでに人間のトッププレイヤに迫る強さとなっており，単純に人間の指し手を再現することが必ずしも「強い」プレイヤの生成に結び付くとは限らない．本論文では，このような課題を改善するため，教師データに重要度を導入した学習手法を提案する．提案手法では，勝率を適応度とした進化的計算による重要度の学習と，重要度に従ったパラメータ学習を組み合わせた学習を行う．提案手法を将棋の評価関数，実現確率，playoutの方策の学習へ適用した結果，従来手法との対局実験において有意に勝ち越すことに成功し，その有効性を示した．また，実験結果から局面の進行度や戦術等によって教師データの重要度に違いが生じることが分かり，教師データの効果的な利用により，より強いプログラムを実現する知識の獲得が可能となることを示した．

Recently, machine learning is attracting much attention in the field of game programming, and it has succeeded in tuning evaluation functions, search depth, playout policies in Monte-Carlo Tree Search, etc. Existing machine learning methods in game programming tune parameters by using game records of human expert players. However, computer programs have almost the same strength as human professional players in some games such as shogi. Thus, learning by simply using human records is not necessarily good for generating strong computer players. In this paper, we propose a new learning method that estimates the importance of each training record by playing many games and tunes parameters according to the importance. The experimental results show the effectiveness of our method for learning evaluation functions, realization probability search, and playout policies. Moreover, the results show that features of training data such as progress of games or tactics affects their importance.

対局に基づいた教師データの重要度の学習

書誌事項

この論文をさがす

説明

収録刊行物

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

対局に基づいた教師データの重要度の学習

書誌事項

この論文をさがす

説明

収録刊行物

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

詳細情報詳細情報について