花札のこいこいにおける方策勾配法とNeural Fitted Q Iteration の適用

書誌事項

タイトル別名
  • Applying Policy Gradient method and Neural Fitted Q Iteration for Hanafuda Koi-Koi game player

抄録

花札の「こいこい」ゲームは交互2人零和不完全情報ゲームの一種で,様々な媒体で多くの人に遊ばれているが研究例が少なく,人間の上級者に匹敵する人工プレイヤが開発されたという話も聞かない.そのため我々は強化学習の方策勾配法とNeural Fitted Q Iterationを用いて強い「こいこい」プレイヤの実装を試みた.それぞれ盤面の低級な特徴量268個を入力に用いた人工ニューラルネットワークを状態行動価値の推定に用い,簡単なルールベース人工プレイヤとの反復対戦を通じて適切なパラメータの学習を行った.その結果それぞれ対戦相手から搾取した平均スコアは-0.3点と0.5点となった. :Koi-koi game, which is played using Hanafuda playing cards, is a Japanese traditional card game classi?ed as two players turn based imperfect information zero sum game. There are few research article focusing on this game even though this game is popular in Japan. Therefore, we tried to make strong Koi-koi game player by applying two types of reinforcement learning methods. We applied policy gradient method and neural ?tted Q iteration. Each player played games against an arti?cial player which we constructed making its decision in a simple rule based manner. Over 1,000 times game, policy gradient player gained -0.3 score per game and neural ?tted Q iteration player gained 0.5 scores in average.

identifier:https://dspace.jaist.ac.jp/dspace/handle/10119/16089

収録刊行物

詳細情報 詳細情報について

問題の指摘

ページトップへ