Logical Inference with Phrasal Knowledge Injection using Vision-and-Language Model

DOI

Bibliographic Information

Other Title
  • 論理推論におけるVision-and-Languageモデルを用いたフレーズ間知識の補完

Abstract

<p>Recognizing Textual Entailment (RTE) is an important task, which is applied to question-answering and machine translation. One of the main challenges in logic-based approaches to this task is the lack of background knowledge. This study proposes a logical inference system with phrasal knowledge by comparing their visual representations based on the intuition that visual representations facilitate humans to judge entailment relations. First, we obtain candidate phrase pairs for phrasal knowledge from the process of logical inference. Second, using a Vision-and-Language model, the visual representations of these phrases are acquired in the form of images or embedding vectors. Finally, the obtained visual representations are compared to determine whether to inject the knowledge corresponding to the candidate or not. Besides simple similarity between phrases, asymmetric relations are considered in comparing visual representations. Our logical inference system improved the accuracy on the SICK dataset compared with a previous logical inference system, SPSA.</p>

Journal

Details 詳細情報について

Report a problem

Back to top