Fast segment search for corpus-based speech enhancement based on speech recognition technology

Takaaki Hori, Keisuke Kinoshita, Atsunori Ogawa, Tomohiro Nakatani, Atsushi Nakamura

doi:10.1109/icassp.2014.6853859

Corpus-based speech enhancement has received increasing attention recently since it shows high enhancement performance in highly non-stationary noisy environments by precisely modeling the long-term temporal dynamics of speech. However, it has a disadvantage in that the cost is very high for searching the longest matching clean speech segments from a multi-condition parallel speech corpus. This paper proposes a fast segment search method for corpus-based speech enhancement. It is mainly based on two techniques derived from speech recognition technology. The first is an A* search like segment evaluation function for accurately finding the longest matching segments. The second is a tree and linear connected search space for efficiently sharing the segment likelihood calculations. In the experiments for non-stationary noisy observations using the 26 multi-condition TIMIT parallel speech corpus, the proposed search method found the segments almost in real-time without degrading the quality of the enhanced speech. Our method was about 7 to 13 times faster than the conventional segment search method.

Fast segment search for corpus-based speech enhancement based on speech recognition technology

Description

Journal

Details 詳細情報について

Export

Report a problem