Interpretable Deep Clustering using Learnable Spectrogram Templates

DOI

Bibliographic Information

Other Title
  • スペクトログラムテンプレートの学習に基づく解釈可能な深層クラスタリング法

Abstract

<p>Deep clustering (DC) has been shown to perform impressively in various speech separation tasks. The idea is to model and train the process of obtaining an embedding for each time-frequency (TF) bin so that the embeddings for the TF bins dominated by the same source are forced to get close to each other. To further enhance the ability of DC, it is important to make the embedding process interpretable so as to make it easier to analyze and overcome its limitation. Motivated by this, in this paper, we propose modeling the embedding process in DC using a network architecture that can be interpreted as a process of fitting learnable spectrogram templates with non-negative entries to an input spectrogram. The proposed model enables us to visualize and understand the clues according to which the model determines the embeddings when performing separation, while maintaining the performance comparable to the original DC.</p>

Journal

Details 詳細情報について

Report a problem

Back to top