THE OPTIMAL SOLUTION OF THE LOB-PASS PROBLEM WITH KNOWN REACTION CURVES

Hiraoka Kazuyuki, Yoshizawa Shuji

doi:10.15807/jorsj.41.509

The "lob-pass problem" is a model which is used in the psychology. It describes the phenomena that the same choices decrease the effect, like the experience or the weariness. Abe and Takeuchi formulated it as an on-line learning problem, and pointed out that it is an extension of the multi-armed bandit problem. In the lob-pass problem, the player's choices will change the environment itself. This is the difference from the multi-armed bandit problems. The all proposed strategies for the lob-pass problem repeat the following procedures: (i) observe the reaction from the unknown environment (ii) estimate the environment (iii) find the optimal "stationary" strategy for the estimated environment (iv) determine the choice according to the strategy. Moreover, the criteria for the strategies in these studies are the loss due to uncertainness of the environment, compared with the optimal "stationary" strategy for the known-environment case. To judge whether such policies are appropriate or not, we have to know the optimal strategy, which may not be "stationary", for the known-environment case. It is calculated in the present paper. It is also shown that the "matching condition" assumed in the past studies is the necessary and sufficient condition that the optimal strategy doesn't depend on the stopping time of the game. The meaning and the appropriateness of the matching condition are discussed. Finally, the asymptotically optimality is defined. We prove that the stationary strategy can be asymptotically optimal for the opponent with the forgetting factor, but no strategy is asymptotically optimal for the opponent without the forgetting factor.

THE OPTIMAL SOLUTION OF THE LOB-PASS PROBLEM WITH KNOWN REACTION CURVES

Bibliographic Information

Search this article

Abstract

Journal

References(11)*help

Keywords

Details 詳細情報について

Export

Report a problem