球面調和関数展開に基づく近接音抽出を用いた時間-周波数マスク推定による近接/遠方音分離

この論文をさがす

抄録

We propose the combination of a physical-model-based and a deep-learning (DL)-based source separation for near- and far-field source separation. The DL-based near- and far-field source separation method uses spherical-harmonic-analysis-based acoustic features. Deep learning is a state-of-the-art technique for source separation. In this approach, a bidirectional long short term memory (BLSTM) is used to predict a time-frequency (T-F) mask. To accurately predict a T-F mask, it is necessary to use acoustic features that have high mutual information with the oracle T-F mask. In this study, low-frequency-band near- and far-field sources are estimated based on spherical harmonic analysis and used as acoustic features. Subsequently, a DNN predicts a T-F mask to separate all frequency bands. Our experimental results show that the proposed method improved the signal-to-distortion-rate by 8-10 dB compared to the harmonic-analysis-based method. IIn addition, the proposed method improved the PESQ and STOI compared to the conventional DL-based T-F mask estimation method.

収録刊行物

詳細情報 詳細情報について

  • CRID
    1390290699808131072
  • NII論文ID
    120006897135
  • NII書誌ID
    AA12746425
  • DOI
    10.15002/00022730
  • HANDLE
    10114/00022730
  • ISSN
    24321192
  • 本文言語コード
    ja
  • データソース種別
    • JaLC
    • IRDB
    • CiNii Articles
  • 抄録ライセンスフラグ
    使用可

問題の指摘

ページトップへ