Sketch-Based Image Synthesis with Deep Generative Networks

利用深度生成網絡基於草圖的圖像合成技術

Student thesis: Doctoral Thesis

View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Awarding Institution
Supervisors/Advisors
Award date11 Aug 2021

Abstract

Sketching provides an intuitive intermediate to depict design ideas and is widely adopted in the design process. People use sketches to represent complex shapes due to their diversity, flexibility, concision, and efficiency. However, inferring desired content from an input sketch is not easy due to the ill-posed nature. With the advancement of deep learning techniques, image-to-image translation provides a general solution for sketch-based image synthesis tasks. This thesis explores the topic of sketch-based image synthesis with image-to-image frameworks in three aspects: corrective control, imperfect input, and generation quality.

We propose three deep learning-based sketch-to-image techniques to assist users in creating a desired image given a freehand sketch. The first technique aims at automatically synthesizing normal maps with sketch inputs. High-quality normal maps are important intermediates for representing complex shapes. We present an interactive system Sketch2Normal for generating normal maps from freehand sketches with the help of deep learning techniques. Utilizing the Generative Adversarial Network (GAN) framework, our method produces high-quality normal maps with sketch inputs. We further enhance the interactivity of the system by incorporating user-specified normals at selected points. Sketch2Normal generates high-quality normal maps in real-time. Through comprehensive experiments, we show the effectiveness and robustness of our system. A thorough perceptive study indicates the normal maps generated by Sketch2Normal achieves a lower perceptual difference from the ground truth compared to the alternative methods.

The human face is a preferable studied subject due to its great need in various applications. We present DeepFaceDrawing to implicitly model the shape space of plausible face images and synthesize a face image in this space to approximate an input sketch. DeepFaceDrawing takes a local-to-global approach by first learning feature embeddings of key face components and pushing corresponding parts of input sketches towards the underlying component manifolds defined by the feature vectors of face component samples. We also propose another deep neural network to learn the mapping from the embedded component features to realistic images with multi-channel feature maps as intermediate results to improve the information flow. DeepFaceDrawing essentially uses input sketches as soft constraints and is thus able to produce high-quality face images even from rough and/or incomplete sketches. Our method is easy to use even for non-artists, while still supporting fine-grained control of shape details. Both qualitative and quantitative evaluations show the superior generation ability of DeepFaceDrawing to existing and alternative solutions. The usability and expressiveness of our system are confirmed by a user study.

The recently proposed StyleGAN architectures achieve state-of-the-art generation performances but the original StyleGAN is not friendly for sketch-based creation due to its unconditional generation nature. To address this issue, we propose a direct conditioning strategy to better preserve the spatial information of an input sketch during sketch-to-image synthesis under the StyleGAN framework. Specifically, we introduce Spatially-Conditioned-StyleGAN (SC-StyleGAN for short), which explicitly injects spatial constraints to the original StyleGAN generation process. We explore two input modalities, sketches and semantic maps, which together allow users to express desired generation results more precisely. Based on SC-StyleGAN, we present DrawingInStyles, a novel drawing interface for non-professional users to easily produce high-quality, photo-realistic face images with precise control, either from scratch or editing existing ones. Qualitative and quantitative evaluations show the superior generation ability of our method to existing and alternative solutions. The usability and expressiveness of our system are confirmed by a user study.