Reinforcement Learning in Multi-Party Trading Dialog
説明
In this paper, we apply reinforcement learning (RL) to a multi-party trading scenario where the dialog system (learner) trades with one, two, or three other agents. We experiment with different RL algorithms and reward functions. The negotiation strategy of the learner is learned through simulated dialog with trader simulators. In our experiments, we evaluate how the performance of the learner varies depending on the RL algorithm used and the number of traders. Our results show that (1) even in simple multi-party trading dialog tasks, learning an effective negotiation policy is a very hard problem; and (2) the use of neural fitted Q iteration combined with an incremental reward function produces negotiation policies as effective or even better than the policies of two strong hand-crafted baselines.
収録刊行物
-
- Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue
-
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue 32-41, 2015-01-01
Association for Computational Linguistics (ACL)