Interactive Image Search System Based on Multimodal Analogy

Keiichiro Shirai, Hidetoshi Miyao, Kosuke Ota, Minoru Maruyama

doi:10.1007/978-3-319-58753-0_83

We propose an image search system based on multimodal analogy, which is enabled by using a visual-semantic embedding model. It allows us to perform analogical reasoning over images by specifying properties to be added to/subtracted with words such as [a image of a blue car] - ‘blue’ + ‘red’. The system mainly consists of the following two parts: (i) an encoder that learns image-text embeddings and (ii) a similarity measure between embeddings in a multimodal vector space. As for the encoder, we adopt a CNN-LSTM encoder proposed in [1], which was reported that it can learn multimodal linguistic regularities. We also introduce a new similarity measure based on the difference between additive and subtractive query. It gives us reasonably better results than the previous approach at qualitative analogical reasoning tasks.

Interactive Image Search System Based on Multimodal Analogy

説明

詳細情報詳細情報について

書き出し

問題の指摘

Interactive Image Search System Based on Multimodal Analogy

説明

詳細情報 詳細情報について

書き出し

問題の指摘

詳細情報詳細情報について