自己対戦棋譜を利用した半教師あり学習による将棋の評価関数の学習

書誌事項

タイトル別名
  • Learning Shogi evaluation functions by semi-supervised learning

説明

近年将棋の評価関数の学習は, 熟練者の棋譜を用いた教師あり学習を用いて行われている. しかし, 人間同士の棋譜の数には限界があり, 人間同士の対局の棋譜のみを使っていては現れにくい局面が存在する. 本研究では将棋の評価関数の学習に自己対戦棋譜を利用した Self-Training を適用した. Self-Trainingは半教師あり学習手法の一つであり, それを用いることで熟練者の棋譜に加えて自己対戦棋譜中の信頼できる局面のみを選択的に学習する. Self-Training ではラベルなしデータとして自己対戦棋譜を用い,最善手と次善手の評価値の差に着目して学習に用いる局面を選択した.評価として, 熟練者の棋譜のみを用いて学習をおこなったプログラムとの対戦実験を行い, 最大で 56.8%の勝率を得ることができた.

Recently, shogi evaluation functions are usually trained by using professional players' game records. However, the number of the game records is limited, and it is difficult to train the features in the positions which rarely appear in the game records. This research proposes a method to apply a self-training algorithm to train the evaluation functions. Self-training is a semi-supervised learning algorithm, and, with the algorithm, our method can train evaluation functions using reliable positions in self-play records, in addition to the professional players' game records. The reliable positions are selected by using the difference of the evaluation scores between the best move and the second best move. Our method is evaluated by comparing the player trained with our method to one trained with supervised learning. The experimental results show that the player trained with our method can acheive a 56.8% winning percentage.

収録刊行物

詳細情報 詳細情報について

問題の指摘

ページトップへ