A Hybrid Approach for Detecting Prerequisite Relations in Multi-modal Food Recipes

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review

View graph of relations

Author(s)

  • Liangming Pan
  • Shaoteng Liu
  • Min-Yen Kan
  • Tat-Seng Chua

Related Research Unit(s)

Detail(s)

Original languageEnglish
Pages (from-to)4491-4501
Number of pages11
Journal / PublicationIEEE Transactions on Multimedia
Volume23
Online published9 Dec 2020
Publication statusPublished - 2021

Abstract

Modeling the structure of culinary recipes is the core of recipe representation learning. Current approaches mostly focus on extracting the workflow graph from recipes based on text descriptions. Process images, which constitute an important part of cooking recipes, has rarely been investigated in recipe structure modeling. We study this recipe structure problem from a multi-modal learning perspective, by proposing a prerequisite tree to represent recipes with cooking images at a step-level granularity. We propose a simple-yet-effective two-stage framework to automatically construct the prerequisite tree for a recipe by (1) utilizing a trained classifier to detect pairwise prerequisite relations that fuses multi-modal features as input; then (2) applying different strategies (greedy method, maximum weight, and beam search) to build the tree structure. Experiments on the MM-ReS dataset demonstrates the advantages of introducing process images for recipe structure modeling. Also, compared with neural methods which require large numbers of training data, we show that our two-stage pipeline can achieve promising results using only 400 labeled prerequisite trees as training data.

Research Area(s)

  • Cause-and-Effect Reasoning, Cooking Workflow, Deep Learning, Food Recipes, Multi-modal Fusion, Prerequisite Trees

Citation Format(s)

A Hybrid Approach for Detecting Prerequisite Relations in Multi-modal Food Recipes. / Pan, Liangming; Chen, Jingjing; Liu, Shaoteng et al.

In: IEEE Transactions on Multimedia, Vol. 23, 2021, p. 4491-4501.

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review