擬似教師ありキャプション生成における部分的不一致の除去

書誌事項

タイトル別名
  • Removing Partial Mismatches in Unsupervised Image Captioning

抄録

<p>Unsupervised image captioning is a task to describe images without the supervision of image–sentence pairs. With the support of pre-trained object detectors, previous work assigned pseudo-captions, i.e., sentences that contain the detected object labels, to a given image. They focused on aligning the pseudo-captions with input images at the sentence level. However, pseudo-captions contain many words that are irrelevant to a given image. To shed light on the problem of partial mismatches between images and pseudo-captions, we focus on removing mismatched words from image–sentence alignment. We propose a simple gating mechanism that is trained to align image features with only the most reliable words in pseudo-captions: the detected object labels. The superior performance of our method empirically demonstrates the importance of removing the partial mismatches. Detailed analysis elucidates that our method successfully improves its performance in predicting the words likely to be mismatched during training. Furthermore, we show that using our method as an initialization method significantly boosts the performance of the previous sentence-level alignment method. These results confirm the importance of careful alignment in word-level details.</p>

収録刊行物

参考文献 (43)*注記

もっと見る

詳細情報 詳細情報について

問題の指摘

ページトップへ