Anticipation Captioning with Commonsense Knowledge

VO Duc Minh, LUONG Quoc-An, SUGIMOTO Akihiro, NAKAYAMA Hideki

doi:10.11370/isj.62.588

この論文をさがす

説明

<p>In this review, we introduce a novel image captioning task, called Anticipation Captioning, which generates a caption for an unseen image given a sparsely temporally-ordered set of images. Our task emulates the human capacity to reason about the future based on a sparse collection of visual cues acquired over time. To address this novel challenge, we introduce a model, namely A-CAP, that predicts the caption by incorporating commonsense knowledge into a pre-trained vision-language model. Our method outperforms image captioning methods and provides a solid baseline for anticipation captioning task, as shown in both qualitative and quantitative evaluations on a customized visual storytelling dataset. We also discuss the potential applications, challenges, and future directions of this novel task.</p>

収録刊行物

日本画像学会誌

日本画像学会誌 62 (6), 588-598, 2023-12-10

一般社団法人日本画像学会

キーワード

詳細情報詳細情報について

CRID: 1390298433281722880

NII書誌ID: AA1137305X

DOI: 10.11370/isj.62.588

ISSN: 18804675; 13444425

NDL書誌ID: 033225816

Web Site: http://id.ndl.go.jp/bib/033225816; https://ndlsearch.ndl.go.jp/books/R000000004-I033225816

本文言語コード: en

データソース種別

JaLC
NDL

抄録ライセンスフラグ: 使用不可

Anticipation Captioning with Commonsense Knowledge

この論文をさがす

説明

収録刊行物

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

Anticipation Captioning with Commonsense Knowledge

この論文をさがす

説明

収録刊行物

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

詳細情報詳細情報について