Multi-modal Cooking Workflow Construction for Food Recipes
Research output: Chapters, Conference Papers, Creative and Literary Works › RGC 32 - Refereed conference paper (with host publication) › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Title of host publication | MM ‘20 |
Subtitle of host publication | Proceedings of the 28th ACM International Conference on Multimedia |
Publisher | Association for Computing Machinery |
Pages | 1132-1141 |
ISBN (print) | 9781450379885 |
Publication status | Published - Oct 2020 |
Publication series
Name | MM - Proceedings of the ACM International Conference on Multimedia |
---|
Conference
Title | 28th ACM International Conference on Multimedia (MM 2020) |
---|---|
Location | Virtual |
Place | United States |
City | Seattle |
Period | 12 - 16 October 2020 |
Link(s)
Abstract
Understanding food recipe requires anticipating the implicit causal effects of cooking actions, such that the recipe can be converted into a graph describing the temporal workflow of the recipe. This is a non-trivial task that involves common-sense reasoning. However, existing efforts rely on hand-crafted features to extract the workflow graph from recipes due to the lack of large-scale labeled datasets. Moreover, they fail to utilize the cooking images, which constitute an important part of food recipes. In this paper, we build MM-ReS, the first large-scale dataset for cooking workflow construction, consisting of 9,850 recipes with human-labeled workflow graphs. Cooking steps are multi-modal, featuring both text instructions and cooking images. We then propose a neural encoder-decoder model that utilizes both visual and textual information to construct the cooking workflow, which achieved over 20% performance gain over existing hand-crafted baselines.
Research Area(s)
- cause-and-effect reasoning, cooking workflow, deep learning, food recipes, mm-res dataset, multi-modal fusion
Bibliographic Note
Full text of this publication does not contain sufficient affiliation information. With consent from the author(s) concerned, the Research Unit(s) information for this record is based on the existing academic department affiliation of the author(s).
Citation Format(s)
Multi-modal Cooking Workflow Construction for Food Recipes. / Pan, Liang-Ming; Chen, Jingjing; Wu, Jianlong et al.
MM ‘20: Proceedings of the 28th ACM International Conference on Multimedia. Association for Computing Machinery, 2020. p. 1132-1141 (MM - Proceedings of the ACM International Conference on Multimedia).
MM ‘20: Proceedings of the 28th ACM International Conference on Multimedia. Association for Computing Machinery, 2020. p. 1132-1141 (MM - Proceedings of the ACM International Conference on Multimedia).
Research output: Chapters, Conference Papers, Creative and Literary Works › RGC 32 - Refereed conference paper (with host publication) › peer-review