Quantifying a Multi-person Meeting based on Multi-modal Micro-behavior Analysis

CHEN Chenhao, TOKUHARA Kosuke, ARAKAWA Yutaka, WATANABE Ko, ISHIMARU Shoya

doi:10.11517/pjsai.jsai2022.0_1p1gs1004

Bibliographic Information

Other Title

マルチモーダルなマイクロ行動分析に基づく複数人会議の定量化

Description

<p>In this paper, we present an end-to-end online meeting quantifying system, which can exactly detect and quantify three micro-behavior indicators, speaking, nodding, and smile, for online meeting evaluation. For active speaker detection (ASD), we build a multi-modal neural network framework which consists of audio and video temporal encoders, audio-visual cross-attention mechanism for inter-modality interaction, and a self-attention mechanism to capture long-term speaking evidence. For nodding detection, based on the WHENet framework proposed in the research field of head pose estimation (HPE), we can estimate the head pitch angles as the nodding feature. Then we build a gated recurrent unit (GRU) network with squeeze-and-excitation (SE) module to recognize nodding movement from videos. Finally, we utilize a Haar cascade classifier for smile detection. The experimental results using K-fold Cross Validation show that the F1-score of each detection module achieves 94.9%, 79.67% and 71.19% respectively.</p>

Journal

Proceedings of the Annual Conference of JSAI

Proceedings of the Annual Conference of JSAI JSAI2022 (0), 1P1GS1004-1P1GS1004, 2022

The Japanese Society for Artificial Intelligence

Keywords

Details 詳細情報について

CRID: 1390574181047876608

DOI: 10.11517/pjsai.jsai2022.0_1p1gs1004

ISSN: 27587347

Text Lang: ja

Data Source

JaLC

Abstract License Flag: Disallowed

Export

Report a problem