Skip to main navigation Skip to search Skip to main content

R2GAN: Cross-modal Recipe Retrieval with Generative Adversarial Network

Research output: Conference PapersRGC 33 - Other conference paperpeer-review

Abstract

Representing procedure text such as recipe for cross modal retrieval is inherently a difficult problem, not mentioning to generate image from recipe for visualization. This paper studies a new version of GAN, named Recipe Retrieval Generative Adversarial Network (R2GAN), to explore the feasibility of generating image from procedure text for retrieval problem. The motivation of using GAN is twofold: learning compatible cross-modal features in an adversarial way, and explanation of search results by showing the images generated from recipes. The novelty of R2GAN comes from architecture design, specifically a GAN with one generator and dual discriminators is used, which makes the generation of image from recipe a feasible idea. Furthermore, empowered by the generated images, a two-level ranking loss in both embedding and image spaces are considered. These add-ons not only result in excellent retrieval performance, but also generate close-to-realistic food images useful for explaining ranking of recipes. On recipe1M dataset, R2GAN demonstrates high scalability to data size,outperforms all the existing approaches, and generates images intuitive for human to interpret the search results. 
Original languageEnglish
Pages11469-11478
DOIs
Publication statusPublished - Jun 2019
Event32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019) - Long Beach, United States
Duration: 16 Jun 201920 Jun 2019
http://cvpr2019.thecvf.com/

Conference

Conference32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019)
PlaceUnited States
CityLong Beach
Period16/06/1920/06/19
Internet address

Bibliographical note

Research Unit(s) information for this publication is provided by the author(s) concerned.

Research Keywords

  • Categorization
  • Image and Video Synthesis
  • Recognition: Detection
  • Representation Learning
  • Retrieval
  • Vision + Language

Fingerprint

Dive into the research topics of 'R2GAN: Cross-modal Recipe Retrieval with Generative Adversarial Network'. Together they form a unique fingerprint.

Cite this