Fast Reinforcement Learning of Dialogue Policies Using Stable Function Approximation
説明
We propose a method to speed up reinforcement learning of policies for spoken dialogue systems. This is achieved by combining a coarse grained abstract representation of states and actions with learning only in frequently visited states. The value of unsampled states is approximated by a linear interpolation of known states. Experiments show that the proposed method effectively optimizes dialogue strategies for frequently visited dialogue states.