Semantic Consistency Assessment of Visual and Text Content using Multimodal Deep Neural Networks

SUZUKI Riko, KONISHI Mikito, IKEDA Junya, HAYASHI Daichi, FUKAI So, SUGAWARA Yu, MACHII Yusuke, YAMAURA Yusuke

doi:10.11517/pjsai.jsai2020.0_3q5gs901

Bibliographic Information

Other Title

マルチモーダル深層学習を用いた画像とテキストの意味理解に基づく整合性判定

Description

<p>Semantic consistency assessment of an image and text inside a document is important task because readers refer the image to deepen understanding of text content. In this study, we develop a multimodal deep neural networks for the semantic consistency assessment of the image and the text. We propose a novel approach combines binary classification and angular margin loss to acquire discriminative features. We also clarify contradictions between the image and the text by visualizing cross-attention among objects inside the image and words in text. To show the effectiveness of our approach, we evaluate the accuracy of several models using flickr30k dataset which contains images and their captions. The results show that our proposed model outperforms the existing joint embedding model with 0.9 improvements in F-measure.</p>

Journal

Proceedings of the Annual Conference of JSAI

Proceedings of the Annual Conference of JSAI JSAI2020 (0), 3Q5GS901-3Q5GS901, 2020

The Japanese Society for Artificial Intelligence

Keywords

Details 詳細情報について

CRID: 1390566775142931712

NII Article ID: 130007857121

DOI: 10.11517/pjsai.jsai2020.0_3q5gs901

ISSN: 27587347

Text Lang: ja

Data Source

JaLC
CiNii Articles

Abstract License Flag: Disallowed

Export

Report a problem