説明
In this work, we present an end-to-end binaural speech synthesis system that combines a low-bitrate audio codec with a powerful binaural decoder that is capable of accurate speech binauralization while faithfully reconstructing environmental factors like ambient noise or reverb. The network is a modified vector-quantized variational autoencoder, trained with several carefully designed objectives, including an adversarial loss. We evaluate the proposed system on an internal binaural dataset with objective metrics and a perceptual study. Results show that the proposed approach matches the ground truth data more closely than previous methods. In particular, we demonstrate the capability of the adversarial loss in capturing environment effects needed to create an authentic auditory scene.
Accepted to INTERSPEECH 2022. Demo link: https://unilight.github.io/Publication-Demos/publications/e2e-binaural-synthesis
収録刊行物
-
- Interspeech 2022
-
Interspeech 2022 1218-1222, 2022-09-18
ISCA
- Tweet
キーワード
- FOS: Computer and information sciences
- Computer Science - Machine Learning
- Sound (cs.SD)
- Computer Science - Artificial Intelligence
- Computer Science - Sound
- Machine Learning (cs.LG)
- Artificial Intelligence (cs.AI)
- Audio and Speech Processing (eess.AS)
- FOS: Electrical engineering, electronic engineering, information engineering
- Electrical Engineering and Systems Science - Audio and Speech Processing
詳細情報 詳細情報について
-
- CRID
- 1871146592447272192
-
- データソース種別
-
- OpenAIRE