LSTM-based Turn-taking Estimation Model using Lexical/Prosodic Contents and Dialog History
-
- Liu Chaoran
- ATR, HIL
-
- Ishi Carlos
- ATR, HIL
-
- Ishiguro Hiroshi
- Graduate School of Engineering Science, Osaka University
Bibliographic Information
- Other Title
-
- 言語・韻律情報及び対話履歴を用いたLSTMベースのターンテイキング推定
Abstract
<p>A natural conversation involves rapid exchanges of turns while talking. Taking turns at appropriate timing or intervals is a requisite feature for a dialog system as a conversation partner. We propose a Recurrent Neural Network (RNN) based model that takes the current utterance and the dialog history as its input to classify utterances into turn-taking related classes and estimates the turn-taking timing. The dialog history is represented by a sequence of speaker-specified joint embedding of lexical and prosodic contents. To this end, we trained a neural network to embed the lexical and the prosodic contents into a joint embedding space. To learn meaningful embedding spaces, the prosodic feature sequence from each single utterance is mapped into a fixed-dimensional space using RNN and combined with utterance lexical embedding. These joint embeddings are then shifted to different parts of embedding spaces according to the speakers. Finally, the speaker-specified joint embeddings are used as the input of our proposed model. We tested this model on a spontaneous conversation dataset and confirmed that it outperformed conventional models that use lexical/prosodic features and dialog history without speaker information.</p>
Journal
-
- Transactions of the Japanese Society for Artificial Intelligence
-
Transactions of the Japanese Society for Artificial Intelligence 34 (2), C-I65_1-9, 2019-03-01
The Japanese Society for Artificial Intelligence
- Tweet
Keywords
Details 詳細情報について
-
- CRID
- 1390845713055337984
-
- NII Article ID
- 130007606513
-
- ISSN
- 13468030
- 13460714
-
- Text Lang
- ja
-
- Data Source
-
- JaLC
- Crossref
- CiNii Articles
-
- Abstract License Flag
- Disallowed