Logical Inference with Phrasal Knowledge Injection using Vision-and-Language Model

TOMIHARI Akiyoshi, YANAKA Hitomi

doi:10.11517/pjsai.jsai2023.0_1e4gs605

Bibliographic Information

Other Title

論理推論におけるVision-and-Languageモデルを用いたフレーズ間知識の補完

Description

<p>Recognizing Textual Entailment (RTE) is an important task, which is applied to question-answering and machine translation. One of the main challenges in logic-based approaches to this task is the lack of background knowledge. This study proposes a logical inference system with phrasal knowledge by comparing their visual representations based on the intuition that visual representations facilitate humans to judge entailment relations. First, we obtain candidate phrase pairs for phrasal knowledge from the process of logical inference. Second, using a Vision-and-Language model, the visual representations of these phrases are acquired in the form of images or embedding vectors. Finally, the obtained visual representations are compared to determine whether to inject the knowledge corresponding to the candidate or not. Besides simple similarity between phrases, asymmetric relations are considered in comparing visual representations. Our logical inference system improved the accuracy on the SICK dataset compared with a previous logical inference system, SPSA.</p>

Journal

Proceedings of the Annual Conference of JSAI

Proceedings of the Annual Conference of JSAI JSAI2023 (0), 1E4GS605-1E4GS605, 2023

The Japanese Society for Artificial Intelligence

Keywords

Details 詳細情報について

CRID: 1390296808221036416

DOI: 10.11517/pjsai.jsai2023.0_1e4gs605

ISSN: 27587347

Text Lang: ja

Data Source

JaLC

Abstract License Flag: Disallowed

Export

Report a problem