Anticipation Captioning with Commonsense Knowledge

VO Duc Minh, LUONG Quoc-An, SUGIMOTO Akihiro, NAKAYAMA Hideki

doi:10.11370/isj.62.588

Anticipation Captioning with Commonsense Knowledge

DOI Web Site

VO Duc Minh

The University of Tokyo
LUONG Quoc-An

The Graduate University for Advanced Studies
SUGIMOTO Akihiro

National Institute of Informatics
NAKAYAMA Hideki

The University of Tokyo

Search this article

Abstract

<p>In this review, we introduce a novel image captioning task, called Anticipation Captioning, which generates a caption for an unseen image given a sparsely temporally-ordered set of images. Our task emulates the human capacity to reason about the future based on a sparse collection of visual cues acquired over time. To address this novel challenge, we introduce a model, namely A-CAP, that predicts the caption by incorporating commonsense knowledge into a pre-trained vision-language model. Our method outperforms image captioning methods and provides a solid baseline for anticipation captioning task, as shown in both qualitative and quantitative evaluations on a customized visual storytelling dataset. We also discuss the potential applications, challenges, and future directions of this novel task.</p>

Journal

NIHON GAZO GAKKAISHI (Journal of the Imaging Society of Japan)

NIHON GAZO GAKKAISHI (Journal of the Imaging Society of Japan) 62 (6), 588-598, 2023-12-10

The Imaging Society of Japan

Keywords

Details 詳細情報について

CRID

1390298433281722880
NII Book ID

AA1137305X
DOI

10.11370/isj.62.588
ISSN

18804675

13444425
NDL BIB ID

033225816
Web Site

http://id.ndl.go.jp/bib/033225816

https://ndlsearch.ndl.go.jp/books/R000000004-I033225816
Text Lang

en
Data Source
- JaLC
- NDL
Abstract License Flag
Disallowed

Anticipation Captioning with Commonsense Knowledge

Search this article

Abstract

Journal

Keywords

Details 詳細情報について

Export

Report a problem