Bounding-Box Channels for Visual Relationship Detection

Sho Inayoshi, Keita Otani, Antonio Tejero-de-Pablos, Tatsuya Harada

doi:10.1007/978-3-030-58558-7_40

Recognizing the relationship between multiple objects in an image is essential for a deeper understanding of the meaning of the image. However, current visual recognition methods are still far from reaching human-level accuracy. Recent approaches have tackled this task by combining image features with semantic and spatial features, but the way they relate them to each other is weak, mostly because the spatial context in the image feature is lost. In this paper, we propose the bounding-box channels, a novel architecture capable of relating the semantic, spatial, and image features strongly. Our network learns bounding-box channels, which are initialized according to the position and the label of objects, and concatenated to the image features extracted from such objects. Then, they are input together to the relationship estimator. This allows retaining the spatial information in the image features, and strongly associate them with the semantic and spatial features. This way, our method is capable of effectively emphasizing the features in the object area for a better modeling of the relationships within objects. Our evaluation results show the efficacy of our architecture outperforming previous works in visual relationship detection. In addition, we experimentally show that our bounding-box channels have a high generalization ability.

Bounding-Box Channels for Visual Relationship Detection

この論文をさがす

説明

収録刊行物

参考文献 (23)*注記

関連プロジェクト

詳細情報詳細情報について

書き出し

問題の指摘

Bounding-Box Channels for Visual Relationship Detection

この論文をさがす

説明

収録刊行物

参考文献 (23)*注記

関連プロジェクト

詳細情報 詳細情報について

書き出し

問題の指摘

詳細情報詳細情報について