Rapid Speaker Adaptation of Neural Network Based Filterbank Layer for Automatic Speech Recognition

Description

Deep neural networks (DNN) have achieved significant success in the field of automatic speech recognition. Previously, we proposed a filterbank-incorporated DNN which takes power spectra as input features. This method has a function of VTLN (Vocal tract length normalization) and fMLLR (feature-space maximum likelihood linear regression). The filterbank layer can be implemented by using a small number of parameters and is optimized under a framework of backpropagation. Therefore, it is advantageous in adaptation under limited available data. In this paper, speaker adaptation is applied to the filterbank-incorporated DNN. By applying speaker adaptation using 15 utterances, the adapted model gave a 7.4% relative improvement in WER over the baseline DNN at a significance level of 0.005 on CSJ task. Adaptation of filterbank layer also showed better performance than the other adaptation methods; singular value decomposition (SVD) based adaptation and learning hidden unit contributions (LHUC).

Journal

References(25)*help

See more

Related Projects

See more

Details 詳細情報について

Report a problem

Back to top