Generating natural language description of human behavior from video images

Takeshi Tamura, Kunio Fukunaga, Atsuhiro Kojima, Masao Izumi

doi:10.1109/icpr.2000.903020

In visual surveillance applications, it is becoming popular to perceive video images and to interpret them using natural language concepts. We propose an approach to generating a natural language description of human behavior appearing in real video images. First, a head region of a human, on behalf of the whole body, is extracted from each frame. Using a model based method, three dimensional pose and position of the head are estimated. Next, the trajectory of these parameters is divided into segments of monotonous motions. For each segment, we evaluate conceptual features such as degree of change of pose and position and that of relative distance to some objects in the surroundings, and so on. By calculating the product of these feature values, a most suitable verb is selected and other syntactic elements are supplied. Finally natural language text is generated using a technique of machine translation.

Generating natural language description of human behavior from video images

説明

収録刊行物

詳細情報詳細情報について

書き出し

問題の指摘

Generating natural language description of human behavior from video images

説明

収録刊行物

詳細情報 詳細情報について

書き出し

問題の指摘

詳細情報詳細情報について