Cognitive Satisficing Exploration in Dueling Bandit Problems

OYO Kuratomo, WADA Takuma, KAMIYA Takumi, TAKAHASHI Tatsuji

doi:10.11517/pjsai.jsai2021.0_1g2gs2a02

Bibliographic Information

Other Title

比較バンディット問題における認知的満足化探索

Description

<p>Multi-armed bandit problems, which are the most fundamental tasks in reinforcement learning, have been widely applied to a range of problems such as online advertisement delivery and game tree search. In contrast to these traditional bandit problems that require absolute rewards to be quantifiable, dueling bandit problems (DBP) can deal with relative rewards by pairwise comparisons. In DBP, one of the most effective solutions is Double Thompson Sampling (D-TS). However, due to the pairwise comparisons, solving DBP requires many trials and errors, and that causes D-TS to do a lot of computation. In this paper, we focus on the fact that “satisficing” action selection leads to quick search for an action that satisfies a certain target level. We propose an algorithm that is based on Risk-sensitive Satisficing (RS) model. The result showed that there are some datasets on which its performance was inferior to D-TS’s. However, we propose a new method combining RS and T-DS that improves the performance for weak regret in DBP.</p>

Journal

Proceedings of the Annual Conference of JSAI

Proceedings of the Annual Conference of JSAI JSAI2021 (0), 1G2GS2a02-1G2GS2a02, 2021

The Japanese Society for Artificial Intelligence

Keywords

Details 詳細情報について

CRID: 1390851320456355200

NII Article ID: 130008051547

DOI: 10.11517/pjsai.jsai2021.0_1g2gs2a02

ISSN: 27587347

Text Lang: ja

Data Source

JaLC
CiNii Articles

Abstract License Flag: Disallowed

Export

Report a problem