Image Captioners Tell More Than Images Given to Them

Bibliographic Information

Other Title
  • 画像キャプショニングは画像そのものよりも多くを語る

Description

<p>Image captioning, a.k.a. image-to-text, which generates descriptive text from given images, has been rapidly developing through the era of deep learning. To what extent is the information of the original image preserved in the descriptive text generated by an image captioner? To answer that question, we perform an experiment to classify images only from the descriptive text without looking at the images at all, and compare it with a standard CNN-based image classifier. We evaluate several image captioning models on a disaster image classification task, CrisisNLP, and show that descriptive text classifiers can sometimes achieve higher accuracy than the CNN-based classifier. Furthermore, we show that fusing the CNN-based classifier and the descriptive text classifier can provide further accuracy improvement.</p>

Journal

Keywords

Details 詳細情報について

Report a problem

Back to top