Cross-modal recipe retrieval : How to cook this dish?

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

45 Scopus Citations
View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Original languageEnglish
Title of host publicationMultiMedia Modeling
Subtitle of host publication23rd International Conference, MMM 2017, Proceedings
EditorsGylfi Thór Gudmundsson, Shin’ichi Satoh, Laurent Amsaleg, Björn Thór Jónsson, Cathal Gurrin
PublisherSpringer Verlag
Pages588-600
Volume10132 LNCS
ISBN (print)9783319518107
Publication statusPublished - 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10132 LNCS
ISSN (Print)0302-9743
ISSN (electronic)1611-3349

Conference

Title23rd International Conference on MultiMedia Modeling, MMM 2017
PlaceIceland
CityReykjavik
Period4 - 6 January 2017

Abstract

In social media users like to share food pictures. One intelligent feature, potentially attractive to amateur chefs, is the recommendation of recipe along with food. Having this feature, unfortunately, is still technically challenging. First, the current technology in food recognition can only scale up to few hundreds of categories, which are yet to be practical for recognizing ten of thousands of food categories. Second, even one food category can have variants of recipes that differ in ingredient composition. Finding the best-match recipe requires knowledge of ingredients, which is a fine-grained recognition problem. In this paper, we consider the problem from the viewpoint of cross-modality analysis. Given a large number of image and recipe pairs acquired from the Internet, a joint space is learnt to locally capture the ingredient correspondence from images and recipes. As learning happens at the region level for image and ingredient level for recipe, the model has ability to generalize recognition to unseen food categories. Furthermore, the embedded multi-modal ingredient feature sheds light on the retrieval of best-match recipes. On an in-house dataset, our model can double the retrieval performance of DeViSE, a popular cross-modality model but not considering region information during learning.

Research Area(s)

  • Cross-modal retrieval, Multi-modality embedding, Recipe retrieval

Citation Format(s)

Cross-modal recipe retrieval: How to cook this dish? / Chen, Jingjing; Pang, Lei; Ngo, Chong-Wah.
MultiMedia Modeling: 23rd International Conference, MMM 2017, Proceedings. ed. / Gylfi Thór Gudmundsson; Shin’ichi Satoh; Laurent Amsaleg; Björn Thór Jónsson; Cathal Gurrin. Vol. 10132 LNCS Springer Verlag, 2017. p. 588-600 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10132 LNCS).

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review