Visual Structure Constraint for Transductive Zero-Shot Learning in the Wild

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

6 Scopus Citations
View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Original languageEnglish
Pages (from-to)1893–1909
Number of pages17
Journal / PublicationInternational Journal of Computer Vision
Volume129
Issue number6
Online published19 Apr 2021
Publication statusPublished - Jun 2021

Abstract

To recognize objects of the unseen classes, most existing Zero-Shot Learning(ZSL) methods first learn a compatible projection function between the common semantic space and the visual space based on the data of source seen classes, then directly apply it to the target unseen classes. However, for data in the wild, distributions between the source and target domain might not match well, thus causing the well-known domain shift problem. Based on the observation that visual features of test instances can be separated into different clusters, we propose a new visual structure constraint on class centers for transductive ZSL, to improve the generality of the projection function (i.e.alleviate the above domain shift problem). Specifically, three different strategies (symmetric Chamfer-distance, Bipartite matching distance, and Wasserstein distance) are adopted to align the projected unseen semantic centers and visual cluster centers of test instances. We also propose two new training strategies to handle the data in the wild, where many unrelated images in the test dataset may exist. This realistic setting has never been considered in previous methods. Extensive experiments demonstrate that the proposed visual structure constraint brings substantial performance gain consistently and the new training strategies make it generalize well for data in the wild. The source code is available at https://github.com/raywzy/VSC.

Research Area(s)

  • Computer vision, Visual structure constraint, Zero-shot learning

Bibliographic Note

Research Unit(s) information for this publication is provided by the author(s) concerned.