Annotation Efficient Cross-Modal Retrieval with Adversarial Attentive Alignment

EasyChair Preprint 4974

13 pages•Date: February 3, 2021

Po-Yao Huang, Guoliang Kang, Wenhe Liu, Xiaojun Chang and Alexander Hauptmann

Abstract

Visual-semantic embeddings are central to many multimedia applications such as cross-modal retrieval between visual data and natural language descriptions. Conventionally, learning a joint embedding space relies on large parallel multimodal corpora. Since massive human annotation is expensive to obtain, there is a strong motivation in developing versatile algorithms to learn from large corpora with fewer annotations. In this paper, we propose a novel framework to leverage automatically extracted regional semantics from un-annotated images as additional weak supervision to learn visual-semantic embeddings. The proposed model employs adversarial attentive alignments to close the inherent heterogeneous gaps between annotated and un-annotated portions of visual and textual domains. To demonstrate its superiority, we conduct extensive experiments on sparsely annotated multimodal corpora. The experimental results show that the proposed model outperforms state-of-the-art visual-semantic embedding models by a significant margin for cross-modal retrieval tasks on the sparse Flickr30k and MS-COCO datasets. It is also worth noting that, despite using only 20\% of the annotations, the proposed model can achieve competitive performance (Recall at 10 > 80.0\% for 1K and > 70.0\% for 5K text-to-image retrieval) compared to the benchmarks trained with the complete annotations.

Keyphrases: AdversarialAttentive Alignment, Annotation Efficient, cross-modal retrieval

Links:

https://easychair.org/publications/preprint/B7t4

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:4974,
  author    = {Po-Yao Huang and Guoliang Kang and Wenhe Liu and Xiaojun Chang and Alexander Hauptmann},
  title     = {Annotation Efficient Cross-Modal Retrieval with Adversarial Attentive Alignment},
  howpublished = {EasyChair Preprint 4974},
  year      = {EasyChair, 2021}}

Download PDF Open PDF in browser