A robust/fast spoken term detection method based on a syllable n-gram index with a distance metric
この論文をさがす
説明
For spoken document retrieval, it is crucial to consider Out-of-vocabulary (OOV) and the mis-recognition of spoken words. Consequently, sub-word unit based recognition and retrieval methods have been proposed. This paper describes a Japanese spoken term detection method for spoken documents that robustly considers OOV words and mis-recognition. To solve the problem of OOV keywords, we use individual syllables as the sub-word unit in continuous speech recognition. To address OOV words, recognition errors, and high-speed retrieval, we propose a distant n-gram indexing/retrieval method that incorporates a distance metric in a syllable lattice. When applied to syllable sequences, our proposed method outperformed a conventional DTW method between syllable sequences and was about 100 times faster. The retrieval results show that we can detect OOV words in a database containing 44h of audio in less than 10msec per query with an F-measure of 0.54.
収録刊行物
-
- Speech Communication
-
Speech Communication 55 (3), 470-485, 2013-03
Elsevier BV
- Tweet
詳細情報 詳細情報について
-
- CRID
- 1360285707486228608
-
- ISSN
- 01676393
-
- 資料種別
- journal article
-
- データソース種別
-
- Crossref
- KAKEN
- OpenAIRE