Cross-domain Cross-modal Food Transfer

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

9 Scopus Citations
View graph of relations

Related Research Unit(s)

Detail(s)

Original languageEnglish
Title of host publicationMM '20 - Proceedings of the 28th ACM International Conference on Multimedia
Place of PublicationNew York
PublisherAssociation for Computing Machinery
Pages3762-3770
ISBN (print)9781450379885
Publication statusPublished - Oct 2020

Publication series

NameMM - Proceedings of the ACM International Conference on Multimedia

Conference

Title28th ACM International Conference on Multimedia (MM 2020)
LocationVirtual
PlaceUnited States
CitySeattle
Period12 - 16 October 2020

Abstract

The recent works in cross-modal image-to-recipe retrieval pave a new way to scale up food recognition. By learning the joint space between food images and recipes, food recognition is boiled down as a retrieval problem by evaluating the similarity of embedded features. The major drawback, nevertheless, is the difficulty in applying an already-trained model to recognize different cuisines of dishes unknown to the model. In general, model updating with new training examples, in the form of image-recipe pairs, is required to adapt a model to new cooking styles in a cuisine. Nevertheless, in practice, acquiring sufficient number of image-recipe pairs for model transfer can be time-consuming. This paper addresses the challenge of resource scarcity in the scenario that only partial data instead of a complete view of data is accessible for model transfer. Partial data refers to missing information such as absence of image modality or cooking instructions from an image-recipe pair. To cope with partial data, a novel generic model, equipped with various loss functions including cross-modal metric learning, recipe residual loss, semantic regularization and adversarial learning, is proposed for cross-domain transfer learning. Experiments are conducted on three different cuisines (Chuan, Yue and Washoku) to provide insights on scaling up food recognition across domains with limited training resources.

Research Area(s)

  • Food recognition, Cross-modal food retrieval, Cross-domain transfer

Citation Format(s)

Cross-domain Cross-modal Food Transfer. / Zhu, Bin; Ngo, Chong-Wah; Chen, Jing-jing.
MM '20 - Proceedings of the 28th ACM International Conference on Multimedia. New York: Association for Computing Machinery, 2020. p. 3762-3770 (MM - Proceedings of the ACM International Conference on Multimedia).

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review