Rapid Speaker Adaptation of Neural Network Based Filterbank Layer for Automatic Speech Recognition
Description
Deep neural networks (DNN) have achieved significant success in the field of automatic speech recognition. Previously, we proposed a filterbank-incorporated DNN which takes power spectra as input features. This method has a function of VTLN (Vocal tract length normalization) and fMLLR (feature-space maximum likelihood linear regression). The filterbank layer can be implemented by using a small number of parameters and is optimized under a framework of backpropagation. Therefore, it is advantageous in adaptation under limited available data. In this paper, speaker adaptation is applied to the filterbank-incorporated DNN. By applying speaker adaptation using 15 utterances, the adapted model gave a 7.4% relative improvement in WER over the baseline DNN at a significance level of 0.005 on CSJ task. Adaptation of filterbank layer also showed better performance than the other adaptation methods; singular value decomposition (SVD) based adaptation and learning hidden unit contributions (LHUC).
Journal
-
- 2018 IEEE Spoken Language Technology Workshop (SLT)
-
2018 IEEE Spoken Language Technology Workshop (SLT) 574-580, 2018-12
IEEE
- Tweet
Details 詳細情報について
-
- CRID
- 1360567185333122944
-
- Article Type
- journal article
-
- Data Source
-
- Crossref
- KAKEN
- OpenAIRE