Research on Image Captioning for Low-quality Images

この論文をさがす

説明

Along with the vigorous development of image captioning baseline structure, their process methods to input images and subsequent approaches are increasingly sophisticated which led to the increasingly demanding in terms of image quality. Regarding the prevailing transformer and its derivative, its attention mechanism determines that it will allocate the attention to the region where it deems too important, whereas the extracted image features prompt its allocation decision-making. For the current features extracting convolutional neural network, a bit of interference to the image may cause the extracted features to vary greatly from the original thus impacting the later procedure. And after our experiments, it verified that the generated captions from the low-quality images are either inconsistent with the context or completely mistook the objectives. There are two feasible solutions: first is to train the captioning model with the features extracted from varieties of low-quality images, second is to process and restore these images before it feeds to the caption structure. Essentially, the first solution will cause the caption model incapable of processing the regular images. So, to bridge this gap, we aimed to build up a framework that could restore the images from different degradation. In this work, we propose to design a fast noise estimator and make it combine with the existing state-of-the-arts denoising architecture to restore the images with noises and leverage other effective techniques to deblur or deraining the images and thereby generating concise captions.

収録刊行物

詳細情報 詳細情報について

  • CRID
    1390011540582395392
  • DOI
    10.15002/00025365
  • HANDLE
    10114/00025365
  • ISSN
    24368083
  • 本文言語コード
    en
  • 資料種別
    departmental bulletin paper
  • データソース種別
    • JaLC
    • IRDB
  • 抄録ライセンスフラグ
    使用可

問題の指摘

ページトップへ