Re-ranking of spoken term detections using CRF-based triphone detection models
説明
Conventional spoken term detection (STD) techniques, which use a text-based matching approach based on automatic speech recognition (ASR) systems, are not robust for speech recognition errors. This paper proposes a conditional random fields (CRF)-based re-ranking approach, which recomputes detection scores produced by a phoneme-based dynamic time warping (DTW) STD approach. In the re-ranking approach, we tackle STD as a sequence labeling problem. We use CRF-based triphone detection models based on features generated from multiple types of phoneme-based transcriptions. They train recognition error patterns such as phoneme-to-phoneme confusions on the CRF framework. Therefore, the models can detect a triphone, which is one of triphones composing a query term, with detection probability. In the experimental evaluation on the Japanese OOV test collection, the CRF-based approach alone could not outperform the conventional DTW-based approach we have already proposed; however, it worked well in the re-ranking (second-pass) process for the detections from the DTW-based approach. The CRF-based re-ranking approach made a 2.4% improvement of F-measure in the STD performance.
収録刊行物
-
- Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific
-
Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific 1-4, 2014-12
IEEE
- Tweet
詳細情報 詳細情報について
-
- CRID
- 1360567185107710336
-
- 資料種別
- journal article
-
- データソース種別
-
- Crossref
- KAKEN
- OpenAIRE