Interpretable Deep Clustering using Learnable Spectrogram Templates

WATANABE Chihiro, KAMEOKA Hirokazu

doi:10.11517/pjsai.jsai2020.0_2q1gs1001

Bibliographic Information

Other Title

スペクトログラムテンプレートの学習に基づく解釈可能な深層クラスタリング法

Abstract

<p>Deep clustering (DC) has been shown to perform impressively in various speech separation tasks. The idea is to model and train the process of obtaining an embedding for each time-frequency (TF) bin so that the embeddings for the TF bins dominated by the same source are forced to get close to each other. To further enhance the ability of DC, it is important to make the embedding process interpretable so as to make it easier to analyze and overcome its limitation. Motivated by this, in this paper, we propose modeling the embedding process in DC using a network architecture that can be interpreted as a process of fitting learnable spectrogram templates with non-negative entries to an input spectrogram. The proposed model enables us to visualize and understand the clues according to which the model determines the embeddings when performing separation, while maintaining the performance comparable to the original DC.</p>

Journal

Proceedings of the Annual Conference of JSAI

Proceedings of the Annual Conference of JSAI JSAI2020 (0), 2Q1GS1001-2Q1GS1001, 2020

The Japanese Society for Artificial Intelligence

Keywords

Details 詳細情報について

CRID: 1390566775142851840

NII Article ID: 130007856990

DOI: 10.11517/pjsai.jsai2020.0_2q1gs1001

Text Lang: ja

Data Source

JaLC
CiNii Articles

Abstract License Flag: Disallowed

Export