Image Captioners Tell More Than Images Given to Them

UDO Honori, KOSHINAKA Takafumi

doi:10.11517/pjsai.jsai2023.0_4a3gs604

【Updated on May 12, 2025】 Integration of CiNii Dissertations and CiNii Books into CiNii Research
Trial version of CiNii Research Knowledge Graph Search feature is available on CiNii Labs
【Updated on June 30, 2025】Suspension and deletion of data provided by Nikkei BP
Regarding the recording of “Research Data” and “Evidence Data”

Image Captioners Tell More Than Images Given to Them

DOI

UDO Honori

Yokohama City University
KOSHINAKA Takafumi

Yokohama City University

Bibliographic Information

Other Title

画像キャプショニングは画像そのものよりも多くを語る

Description

<p>Image captioning, a.k.a. image-to-text, which generates descriptive text from given images, has been rapidly developing through the era of deep learning. To what extent is the information of the original image preserved in the descriptive text generated by an image captioner? To answer that question, we perform an experiment to classify images only from the descriptive text without looking at the images at all, and compare it with a standard CNN-based image classifier. We evaluate several image captioning models on a disaster image classification task, CrisisNLP, and show that descriptive text classifiers can sometimes achieve higher accuracy than the CNN-based classifier. Furthermore, we show that fusing the CNN-based classifier and the descriptive text classifier can provide further accuracy improvement.</p>

Journal

Proceedings of the Annual Conference of JSAI

Proceedings of the Annual Conference of JSAI JSAI2023 (0), 4A3GS604-4A3GS604, 2023

The Japanese Society for Artificial Intelligence

Keywords

Details 詳細情報について

CRID

1390296808221485952
DOI

10.11517/pjsai.jsai2023.0_4a3gs604
ISSN

27587347
Text Lang

ja
Data Source
- JaLC
Abstract License Flag
Disallowed

Image Captioners Tell More Than Images Given to Them

Bibliographic Information

Description

Journal

Keywords

Details 詳細情報について

Export

Report a problem