-
- VO Duc Minh
- The University of Tokyo
-
- LUONG Quoc-An
- The Graduate University for Advanced Studies
-
- SUGIMOTO Akihiro
- National Institute of Informatics
-
- NAKAYAMA Hideki
- The University of Tokyo
Search this article
Abstract
<p>In this review, we introduce a novel image captioning task, called Anticipation Captioning, which generates a caption for an unseen image given a sparsely temporally-ordered set of images. Our task emulates the human capacity to reason about the future based on a sparse collection of visual cues acquired over time. To address this novel challenge, we introduce a model, namely A-CAP, that predicts the caption by incorporating commonsense knowledge into a pre-trained vision-language model. Our method outperforms image captioning methods and provides a solid baseline for anticipation captioning task, as shown in both qualitative and quantitative evaluations on a customized visual storytelling dataset. We also discuss the potential applications, challenges, and future directions of this novel task.</p>
Journal
-
- NIHON GAZO GAKKAISHI (Journal of the Imaging Society of Japan)
-
NIHON GAZO GAKKAISHI (Journal of the Imaging Society of Japan) 62 (6), 588-598, 2023-12-10
The Imaging Society of Japan
- Tweet
Keywords
Details 詳細情報について
-
- CRID
- 1390298433281722880
-
- NII Book ID
- AA1137305X
-
- ISSN
- 18804675
- 13444425
-
- NDL BIB ID
- 033225816
-
- Text Lang
- en
-
- Data Source
-
- JaLC
- NDL
-
- Abstract License Flag
- Disallowed