Research on Image Captioning for Low-quality Images

Xiao Mingjie

doi:10.15002/00025365

Along with the vigorous development of image captioning baseline structure, their process methods to input images and subsequent approaches are increasingly sophisticated which led to the increasingly demanding in terms of image quality. Regarding the prevailing transformer and its derivative, its attention mechanism determines that it will allocate the attention to the region where it deems too important, whereas the extracted image features prompt its allocation decision-making. For the current features extracting convolutional neural network, a bit of interference to the image may cause the extracted features to vary greatly from the original thus impacting the later procedure. And after our experiments, it verified that the generated captions from the low-quality images are either inconsistent with the context or completely mistook the objectives. There are two feasible solutions: first is to train the captioning model with the features extracted from varieties of low-quality images, second is to process and restore these images before it feeds to the caption structure. Essentially, the first solution will cause the caption model incapable of processing the regular images. So, to bridge this gap, we aimed to build up a framework that could restore the images from different degradation. In this work, we propose to design a fast noise estimator and make it combine with the existing state-of-the-arts denoising architecture to restore the images with noises and leverage other effective techniques to deblur or deraining the images and thereby generating concise captions.

Research on Image Captioning for Low-quality Images

この論文をさがす

説明

収録刊行物

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

Research on Image Captioning for Low-quality Images

この論文をさがす

説明

収録刊行物

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

詳細情報詳細情報について