Semantic Consistency Assessment of Visual and Text Content using Multimodal Deep Neural Networks
-
- SUZUKI Riko
- Ochanomizu University
-
- KONISHI Mikito
- Osaka University
-
- IKEDA Junya
- University of Fukui
-
- HAYASHI Daichi
- Doshisha University
-
- FUKAI So
- Tokyo Institute of Technology
-
- SUGAWARA Yu
- Hokkaido University
-
- MACHII Yusuke
- Fuji Xerox Co., Ltd.
-
- YAMAURA Yusuke
- Fuji Xerox Co., Ltd.
Bibliographic Information
- Other Title
-
- マルチモーダル深層学習を用いた画像とテキストの意味理解に基づく整合性判定
Description
<p>Semantic consistency assessment of an image and text inside a document is important task because readers refer the image to deepen understanding of text content. In this study, we develop a multimodal deep neural networks for the semantic consistency assessment of the image and the text. We propose a novel approach combines binary classification and angular margin loss to acquire discriminative features. We also clarify contradictions between the image and the text by visualizing cross-attention among objects inside the image and words in text. To show the effectiveness of our approach, we evaluate the accuracy of several models using flickr30k dataset which contains images and their captions. The results show that our proposed model outperforms the existing joint embedding model with 0.9 improvements in F-measure.</p>
Journal
-
- Proceedings of the Annual Conference of JSAI
-
Proceedings of the Annual Conference of JSAI JSAI2020 (0), 3Q5GS901-3Q5GS901, 2020
The Japanese Society for Artificial Intelligence
- Tweet
Details 詳細情報について
-
- CRID
- 1390566775142931712
-
- NII Article ID
- 130007857121
-
- ISSN
- 27587347
-
- Text Lang
- ja
-
- Data Source
-
- JaLC
- CiNii Articles
-
- Abstract License Flag
- Disallowed