VGSE : Visually-Grounded Semantic Embeddings for Zero-Shot Learning

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

23 Scopus Citations
View graph of relations

Author(s)

  • Wenjia Xu
  • Yongqin Xian
  • Jiuniu Wang
  • Bernt Schiele
  • Zeynep Akata

Related Research Unit(s)

Detail(s)

Original languageEnglish
Title of host publicationProceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition
Subtitle of host publicationCVPR 2022
PublisherInstitute of Electrical and Electronics Engineers, Inc.
Pages9306-9315
ISBN (electronic)9781665469463
ISBN (print)978-1-6654-6947-0
Publication statusPublished - 2022

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN (Print)1063-6919
ISSN (electronic)2575-7075

Conference

Title2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022)
LocationHybrid
PlaceUnited States
CityNew Orleans
Period19 - 24 June 2022

Abstract

Human-annotated attributes serve as powerful semantic embeddings in zero-shot learning. However, their annotation process is labor-intensive and needs expert supervision. Current unsupervised semantic embeddings, i.e., word embeddings, enable knowledge transfer between classes. However, word embeddings do not always reflect visual similarities and result in inferior zero-shot performance. We propose to discover semantic embeddings containing discriminative visual properties for zero-shot learning, without requiring any human annotation. Our model visually divides a set of images from seen classes into clusters of local image regions according to their visual similarity, and further imposes their class discrimination and semantic relatedness. To associate these clusters with previously unseen classes, we use external knowledge, e.g., word embeddings and propose a novel class relation discovery module. Through quantitative and qualitative evaluation, we demonstrate that our model discovers semantic embeddings that model the visual properties of both seen and unseen classes. Furthermore, we demonstrate on three benchmarks that our visually-grounded semantic embeddings further improve performance over word embeddings across various ZSL models by a large margin. Code is available at https://github.com/wenjiaXu/VGSE

Research Area(s)

  • Transfer/low-shot/long-tail learning

Citation Format(s)

VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning. / Xu, Wenjia; Xian, Yongqin; Wang, Jiuniu et al.
Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition: CVPR 2022. Institute of Electrical and Electronics Engineers, Inc., 2022. p. 9306-9315 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition).

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review