Chinese Food Scanner: Modeling Ingredient Composition from Multimedia Perspective

Project: Research

View graph of relations


The ambition of this project is to scale up food recognition through ingredient labelsprediction, which is helpful for applications such as nutrition estimation and dietaryassessment. While there are more researches carried out recently for recognition ofWestern and Japanese food, little efforts have been dedicated to the domain of Chinesefood. Chinese dishes are rich in color, texture and shape. Different from other fooddomains, ingredients are often fuzzily and wildly composed to create novel dishpresentation. The difficulties of recognition originate from not just the richness in visualappearances, but also the “recreations” of recipes as a result of evolution along the space(e.g., geography region) and temporal (e.g., season) dimensions. Recognizing foodcategories, without understanding of ingredient composition, is limited to standardizedcooked food such as in restaurants, and furthermore, hardly scales up and out to dishesprepared “in wild” in the free-living environment.This project addresses two problems: fine-grained ingredient recognition and cookingrecipe retrieval. The general idea is to model ingredient composition for search ofrecipes, which can provide nutritional clues such as non-visible ingredients (e.g., salt,sugar) and quantities of ingredients. The challenges are threefold, nevertheless. First,the appearance of an ingredient varies depending on ways of cutting and cooking.Second, there is no direct mapping between ingredient composition and dish. Specifically,the same set of ingredients may mix and end up in different dish presentations andnames. Third, the preparation of dish depends on preference of tastes and availability ofingredients, which varies across occasions and results in creation of different recipeversions. Finding the exact or best-match recipe for a query food picture, analogous totranslation of visual-to-text or vice versa, is a non-trivial problem.The project considers these problems from the perspective of multimedia computing,more concretely, by integrating a variety of dish-relevant cues, ranging from ingredientand cooking procedure to food context, to disambiguate food recognition. Not onlyingredients, but also the process and consequence of composition are computationallymodeled. In addition, recipe search is posted as a zero-shot retrieval problem to scale upand out recognition by searching recipes for “out of vocabulary” dishes or dishes notpreviously known. The key contributions of this project are modeling of ingredientcomposition and extension beyond which to recognize a pre-defined set of foodcategories – both are new problems not seriously explored in the literature and essentialfor Chinese dish recognition.


Project number9042481
Grant typeGRF
Effective start/end date1/01/18 → …