Adversarial autoencoder for reducing nonlinear distortion

Tetsunori Kobayashi, Tetsuji Ogawa, Kazuhiro Katagiri, Takashi Yazu, Naohiro Tawara, Masaru Fujieda

doi:10.23919/apsipa.2018.8659540

A novel post-filtering method using generative adversarial networks (GANs) is proposed to correct the effect of a nonlinear distortion caused by time-frequency (TF) masking. TF masking is a powerful framework for attenuating interfering sounds, but it can yield an unpleasant distortion of speech (e.g., a musical noise). A GAN-based autoencoder was recently shown to be effective for single-channel speech enhancement, however, using this technique for the post-processing of TF masking cannot help in nonlinear distortion reduction because some TF components are missing after TF-masking. Furthermore, the missing information is difficult embed using an autoencoder. In order to recover such missing components, an auxiliary reference signal that includes the target source components is concatenated with an enhanced signal, is then used as the input to the GAN-based autoencoder. Experimental comparisons show that the proposed post-filtering yields improvements in speech quality over TF-masking.

Adversarial autoencoder for reducing nonlinear distortion

説明

収録刊行物

詳細情報詳細情報について

書き出し

問題の指摘

Adversarial autoencoder for reducing nonlinear distortion

説明

収録刊行物

詳細情報 詳細情報について

書き出し

問題の指摘

詳細情報詳細情報について