Interactive Image Search System Based on Multimodal Analogy

Keiichiro Shirai, Hidetoshi Miyao, Kosuke Ota, Minoru Maruyama

doi:10.1007/978-3-319-58753-0_83

We propose an image search system based on multimodal analogy, which is enabled by using a visual-semantic embedding model. It allows us to perform analogical reasoning over images by specifying properties to be added to/subtracted with words such as [a image of a blue car] - ‘blue’ + ‘red’. The system mainly consists of the following two parts: (i) an encoder that learns image-text embeddings and (ii) a similarity measure between embeddings in a multimodal vector space. As for the encoder, we adopt a CNN-LSTM encoder proposed in [1], which was reported that it can learn multimodal linguistic regularities. We also introduce a new similarity measure based on the difference between additive and subtractive query. It gives us reasonably better results than the previous approach at qualitative analogical reasoning tasks.

Interactive Image Search System Based on Multimodal Analogy

Description

Details 詳細情報について

Export

Report a problem