Conversation Scene Analysis with Dynamic Bayesian Network Basedon Visual Head Tracking

Kazuhiro Otsuka, Junji Yamato, Yoshinao Takemae, Hiroshi Murase

doi:10.1109/icme.2006.262677

A novel method based on a probabilistic model for conversation scene analysis is proposed that can infer conversation structure from video sequences of face-to-face communication. Conversation structure represents the type of conversation such as monologue or dialogue, and can indicate who is talking / listening to whom. This study assumes that the gaze directions of participants provide cues for discerning the conversation structure, and can be identified from head directions. For measuring head directions, the proposed method newly employs a visual head tracker based on Sparse-Template Condensation. The conversation model is built on a dynamic Bayesian network and is used to estimate the conversation structure and gaze directions from observed head directions and utterances. Visual tracking is conventionally thought to be less reliable than contact sensors, but experiments confirm that the proposed method achieves almost comparable performance in estimating gaze directions and conversation structure to a conventional sensor-based method.

Conversation Scene Analysis with Dynamic Bayesian Network Basedon Visual Head Tracking

説明

収録刊行物

被引用文献 (1)*注記

詳細情報詳細情報について

書き出し

問題の指摘

Conversation Scene Analysis with Dynamic Bayesian Network Basedon Visual Head Tracking

説明

収録刊行物

被引用文献 (1)*注記

詳細情報 詳細情報について

書き出し

問題の指摘

詳細情報詳細情報について