Minimum Classification Error for Large Scale Speech Recognition Tasks using Weighted Finite State Transducers

Shigeru Katagiri, Erik McDermott

doi:10.1109/icassp.2005.1415063

Minimum Classification Error for Large Scale Speech Recognition Tasks using Weighted Finite State Transducers

説明

This article describes recent results obtained for two challenging large-vocabulary speech recognition tasks using the minimum classification error (MCE) approach to discriminative training. Weighted finite state transducers (WFSTs) are used throughout to represent correct and competing string candidates. The primary task examined is a 22 K word, real-world, telephone-based name recognition task. Lattice-derived WFSTs were used successfully to speed up the MCE training procedure. The results of this difficult task follow the classic picture of discriminative training: small acoustic models trained with MCE outperform much larger baseline models trained with maximum likelihood. MCE training substantially improves the performance of the larger models as well. We also present preliminary results on the 30 K word corpus of spontaneous Japanese (CSJ) lecture speech transcription task, with a training set of 190 hours of audio.

収録刊行物

Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005.

Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. 1 113-116, 2006-10-11

IEEE

Minimum Classification Error for Large Scale Speech Recognition Tasks using Weighted Finite State Transducers

説明

収録刊行物

詳細情報詳細情報について

書き出し

問題の指摘

Minimum Classification Error for Large Scale Speech Recognition Tasks using Weighted Finite State Transducers

説明

収録刊行物

詳細情報 詳細情報について

書き出し

問題の指摘

詳細情報詳細情報について