Speech enhancement based on deep denoising autoencoder

Xugang Lu, Yu Tsao, Shigeki Matsuda, Chiori Hori

doi:10.21437/interspeech.2013-130

We previously have applied deep autoencoder (DAE) for noise reduction and speech enhancement. However, the DAE was trained using only clean speech. In this study, we further introduce an explicit denoising process in learning the DAE. In training the DAE, we still adopt greedy layer-wised pretraining plus fine tuning strategy. In pretraining, each layer is trained as a one hidden layer neural autoencoder (AE) using noisy-clean speech pairs as input and output (or transformed noisy-clean speech pairs by preceding AEs). Fine tuning was done by stacking all AEs with pretrained parameters for initialization. The trained DAE is used as a filter for speech estimation when noisy speech is given. Speech enhancement experiments were done to examine the performance of the trained denoising DAE. Noise reduction, speech distortion, and perceptual evaluation of speech quality (PESQ) criteria are used in the performance evaluations. Experimental results show that adding depth of the DAE consistently increase the performance when a large training data set is given. In addition, compared with a minimum mean square error based speech enhancement algorithm, our proposed denoising DAE provided superior performance on the three objective evaluations.

Speech enhancement based on deep denoising autoencoder

説明

収録刊行物

被引用文献 (15)*注記

詳細情報詳細情報について

書き出し

問題の指摘

Speech enhancement based on deep denoising autoencoder

説明

収録刊行物

被引用文献 (15)*注記

詳細情報 詳細情報について

書き出し

問題の指摘

詳細情報詳細情報について