-
- Yuhao Cheng
- Shanghai Jiao Tong University, Shanghai, China
-
- Xiaoguang Zhu
- Shanghai Jiao Tong University, Shanghai, China
-
- Jiuchao Qian
- Shanghai Jiao Tong University, Shanghai, China
-
- Fei Wen
- Shanghai Jiao Tong University, Shanghai, China
-
- Peilin Liu
- Shanghai Jiao Tong University, Shanghai, China
説明
<jats:p> Image-text retrieval is a fundamental cross-modal task whose main idea is to learn image-text matching. Generally, according to whether there exist interactions during the retrieval process, existing image-text retrieval methods can be classified into independent representation matching methods and cross-interaction matching methods. The independent representation matching methods generate the embeddings of images and sentences independently and thus are convenient for retrieval with hand-crafted matching measures (e.g., cosine or Euclidean distance). As to the cross-interaction matching methods, they achieve improvement by introducing the interaction-based networks for inter-relation reasoning, yet suffer the low retrieval efficiency. This article aims to develop a method that takes the advantages of cross-modal inter-relation reasoning of cross-interaction methods while being as efficient as the independent methods. To this end, we propose a graph-based <jats:bold>Cross-modal Graph Matching Network (CGMN)</jats:bold> , which explores both intra- and inter-relations without introducing network interaction. In CGMN, graphs are used for both visual and textual representation to achieve intra-relation reasoning across regions and words, respectively. Furthermore, we propose a novel graph node matching loss to learn fine-grained cross-modal correspondence and to achieve inter-relation reasoning. Experiments on benchmark datasets MS-COCO, Flickr8K, and Flickr30K show that CGMN outperforms state-of-the-art methods in image retrieval. Moreover, CGMM is much more efficient than state-of-the-art methods using interactive matching. The code is available at <jats:ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="url" xlink:href="https://github.com/cyh-sj/CGMN">https://github.com/cyh-sj/CGMN</jats:ext-link> . </jats:p>
収録刊行物
-
- ACM Transactions on Multimedia Computing, Communications, and Applications
-
ACM Transactions on Multimedia Computing, Communications, and Applications 18 (4), 1-23, 2022-03-04
Association for Computing Machinery (ACM)
- Tweet
詳細情報 詳細情報について
-
- CRID
- 1360861711911804800
-
- DOI
- 10.1145/3499027
-
- ISSN
- 15516865
- 15516857
-
- データソース種別
-
- Crossref