Reinforcement Learning in Multi-Party Trading Dialog

Elnaz Nouri, David Traum, Satoshi Nakamura, Kallirroi Georgila, Takuya Hiraoka

doi:10.18653/v1/w15-4605

説明

In this paper, we apply reinforcement learning (RL) to a multi-party trading scenario where the dialog system (learner) trades with one, two, or three other agents. We experiment with different RL algorithms and reward functions. The negotiation strategy of the learner is learned through simulated dialog with trader simulators. In our experiments, we evaluate how the performance of the learner varies depending on the RL algorithm used and the number of traders. Our results show that (1) even in simple multi-party trading dialog tasks, learning an effective negotiation policy is a very hard problem; and (2) the use of neural fitted Q iteration combined with an incremental reward function produces negotiation policies as effective or even better than the policies of two strong hand-crafted baselines.

収録刊行物

Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue 32-41, 2015-01-01

Association for Computational Linguistics (ACL)

詳細情報詳細情報について

CRID: 1871146592959463552

DOI: 10.18653/v1/w15-4605

データソース種別

OpenAIRE

書き出し

問題の指摘

Reinforcement Learning in Multi-Party Trading Dialog

説明

収録刊行物

詳細情報 詳細情報について

書き出し

問題の指摘

詳細情報詳細情報について