Interpretable Deep Clustering using Learnable Spectrogram Templates
-
- WATANABE Chihiro
- NTT Communication Science Laboratories
-
- KAMEOKA Hirokazu
- NTT Communication Science Laboratories
Bibliographic Information
- Other Title
-
- スペクトログラムテンプレートの学習に基づく解釈可能な深層クラスタリング法
Abstract
<p>Deep clustering (DC) has been shown to perform impressively in various speech separation tasks. The idea is to model and train the process of obtaining an embedding for each time-frequency (TF) bin so that the embeddings for the TF bins dominated by the same source are forced to get close to each other. To further enhance the ability of DC, it is important to make the embedding process interpretable so as to make it easier to analyze and overcome its limitation. Motivated by this, in this paper, we propose modeling the embedding process in DC using a network architecture that can be interpreted as a process of fitting learnable spectrogram templates with non-negative entries to an input spectrogram. The proposed model enables us to visualize and understand the clues according to which the model determines the embeddings when performing separation, while maintaining the performance comparable to the original DC.</p>
Journal
-
- Proceedings of the Annual Conference of JSAI
-
Proceedings of the Annual Conference of JSAI JSAI2020 (0), 2Q1GS1001-2Q1GS1001, 2020
The Japanese Society for Artificial Intelligence
- Tweet
Details 詳細情報について
-
- CRID
- 1390566775142851840
-
- NII Article ID
- 130007856990
-
- Text Lang
- ja
-
- Data Source
-
- JaLC
- CiNii Articles
-
- Abstract License Flag
- Disallowed