CookGAN : Causality Based Text-to-Image Synthesis

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

51 Scopus Citations
View graph of relations

Related Research Unit(s)

Detail(s)

Original languageEnglish
Title of host publicationProceedings - 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020)
PublisherInstitute of Electrical and Electronics Engineers, Inc.
Pages5518-5526
ISBN (electronic)978-1-7281-7168-5
Publication statusPublished - Jun 2020

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
PublisherIEEE Computer Society
ISSN (Print)1063-6919

Conference

Title2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020)
LocationVirtual
PlaceUnited States
CitySeattle
Period13 - 19 June 2020

Abstract

This paper addresses the problem of text-to-image synthesis from a new perspective, i.e., the cause-and-effect chain in image generation. Causality is a common phenomenon in cooking. The dish appearance changes depending on the cooking actions and ingredients. The challenge of synthesis is that a generated image should depict the visual result of action-on-object. This paper presents a new network architecture, CookGAN, that mimics visual effect in causality chain, preserves fine-grained details and progressively upsamples image. Particularly, a cooking simulator sub-network is proposed to incrementally make changes to food images based on the interaction between ingredients and cooking methods over a series of steps. Experiments on Recipe1M verify that CookGAN manages to generate food images with reasonably impressive inception score. Furthermore, the images are semantically interpretable and manipulable.

Research Area(s)

  • causality effect, text-to-image synthesis, recipe manipulation, image-to-recipe retrieval

Bibliographic Note

Research Unit(s) information for this publication is provided by the author(s) concerned.

Citation Format(s)

CookGAN: Causality Based Text-to-Image Synthesis. / Zhu, Bin; Ngo, Chong-Wah.
Proceedings - 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020). Institute of Electrical and Electronics Engineers, Inc., 2020. p. 5518-5526 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition).

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review