- 【Updated on May 12, 2025】 Integration of CiNii Dissertations and CiNii Books into CiNii Research
- Trial version of CiNii Research Automatic Translation feature is available on CiNii Labs
- Suspension and deletion of data provided by Nikkei BP
- Regarding the recording of “Research Data” and “Evidence Data”
Multi-Talker ASR for an Unknown Number of Sources: Joint Training of Source Counting, Separation and ASR
Description
Most approaches to multi-talker overlapped speech separation and recognition assume that the number of simultaneously active speakers is given, but in realistic situations, it is typically unknown. To cope with this, we extend an iterative speech extraction system with mechanisms to count the number of sources and combine it with a single-talker speech recognizer to form the first end-to-end multi-talker automatic speech recognition system for an unknown number of active speakers. Our experiments show very promising performance in counting accuracy, source separation and speech recognition on simulated clean mixtures from WSJ0-2mix and WSJ0-3mix. Among others, we set a new state-of-the-art word error rate on the WSJ0-2mix database. Furthermore, our system generalizes well to a larger number of speakers than it ever saw during training, as shown in experiments with the WSJ0-4mix database.
5 pages, INTERSPEECH 2020
Journal
-
- Interspeech 2020
-
Interspeech 2020 3097-3101, 2020-10-25
ISCA
- Tweet
Keywords
- FOS: Computer and information sciences
- Sound (cs.SD)
- Computer Science - Computation and Language
- Computer Science - Sound
- Audio and Speech Processing (eess.AS)
- FOS: Electrical engineering, electronic engineering, information engineering
- Computation and Language (cs.CL)
- Electrical Engineering and Systems Science - Audio and Speech Processing
Details 詳細情報について
-
- CRID
- 1360584344422877696
-
- Data Source
-
- Crossref
- OpenAIRE