Layout Modeling for Image and Graphic Design Understanding


Student thesis: Doctoral Thesis

View graph of relations


Related Research Unit(s)


Awarding Institution
Award date4 Sep 2020


Given some media such as images or graphic designs, humans are remarkably capable of understanding the contents behind them, by applying our extensive common-sense knowledge of how the objects are arranged or how they interact with each other. Similarly, machines should go beyond pixel-level reasoning to understand object positions, sizes, and relationships, i.e., layout modeling, in the media. Having a robust layout model can provide priors to infer scene contexts semantically, plan actions early, improve visual attractiveness and communicate messages more efficiently. In this thesis, we investigate several layout modeling approaches for advancing the understanding of images and graphic designs, including scene layout completion, future scene layout forecasting, and graphic design layout generation.

Scene layout completion aims to generate a complete scene layout from an input layout image with a few standalone objects by hallucinating missing contextual information. This problem is difficult as it requires an extensive knowledge of complex and diverse relationships among different objects in natural scenes. We propose a novel neural network that takes as input the properties (i.e., category, shape, and position) of a few objects to predict an object-level scene layout, which compactly encodes the semantics and structure of the scene context of where the given objects are. We show that our model can generate more plausible scene contexts than the baseline approach. We demonstrate that our model allows for the synthesis of realistic scene images from just partial scene layouts and internally learns useful features for scene recognition.

Future scene layout forecasting aims to forecast future scene layouts given the observed past images in an input video. It is of vital importance in many vision applications, e.g., enabling autonomous vehicles to plan actions early. Our key observation is that to anticipate what a scene will look like in the future, humans would typically recognize and localize individual instances in the scene first, and then reason about their spatial and semantic interactions to make the prediction. Rather than performing direct pixel-level prediction as in existing works, we address the problem from an instance-aware perspective. Specifically, our framework explicitly models the dynamics of individual instances and captures their interactions in a scene. Under this formulation, we are able to enforce instance-level constraints to forecast scene layout by effectively reasoning about their spatial and semantic relations. We show that learning with instance-wise formulation is able to produce more accurate scene layouts, yielding state-of-the-art performances.

Graphic design layout generation aims to synthesize various design layouts that can benefit information presentation, guide reader attention and enhance visual attractiveness. We find that generating an effective graphic design layout requires understanding the visual content of image elements and the meaning of text elements in the design. We propose a deep generative model to capture the effect of visual and textual contents on layouts, and implicitly learns complex layout structure variations from data without the use of any heuristic rules. To train our model, we build a large-scale magazine layout dataset with fine-grained layout annotations and keyword labeling. We demonstrate that the proposed model can synthesize plausible layouts based on the visual semantics of input images and keyword-based summary of input text. We also show that our model can learn the features that capture the interaction between contents and layouts, through a design retrieval task.