Minimum Classification Error for Large Scale Speech Recognition Tasks using Weighted Finite State Transducers

説明

This article describes recent results obtained for two challenging large-vocabulary speech recognition tasks using the minimum classification error (MCE) approach to discriminative training. Weighted finite state transducers (WFSTs) are used throughout to represent correct and competing string candidates. The primary task examined is a 22 K word, real-world, telephone-based name recognition task. Lattice-derived WFSTs were used successfully to speed up the MCE training procedure. The results of this difficult task follow the classic picture of discriminative training: small acoustic models trained with MCE outperform much larger baseline models trained with maximum likelihood. MCE training substantially improves the performance of the larger models as well. We also present preliminary results on the 30 K word corpus of spontaneous Japanese (CSJ) lecture speech transcription task, with a training set of 190 hours of audio.

収録刊行物

詳細情報 詳細情報について

問題の指摘

ページトップへ