Semantic Consistency Assessment of Visual and Text Content using Multimodal Deep Neural Networks

Bibliographic Information

Other Title
  • マルチモーダル深層学習を用いた画像とテキストの意味理解に基づく整合性判定

Description

<p>Semantic consistency assessment of an image and text inside a document is important task because readers refer the image to deepen understanding of text content. In this study, we develop a multimodal deep neural networks for the semantic consistency assessment of the image and the text. We propose a novel approach combines binary classification and angular margin loss to acquire discriminative features. We also clarify contradictions between the image and the text by visualizing cross-attention among objects inside the image and words in text. To show the effectiveness of our approach, we evaluate the accuracy of several models using flickr30k dataset which contains images and their captions. The results show that our proposed model outperforms the existing joint embedding model with 0.9 improvements in F-measure.</p>

Journal

Details 詳細情報について

  • CRID
    1390566775142931712
  • NII Article ID
    130007857121
  • DOI
    10.11517/pjsai.jsai2020.0_3q5gs901
  • ISSN
    27587347
  • Text Lang
    ja
  • Data Source
    • JaLC
    • CiNii Articles
  • Abstract License Flag
    Disallowed

Report a problem

Back to top