Deepfacedrawing: Deep Generation of Face Images From Sketches
Deepfacedrawing: Deep Generation of Face Images From Sketches
SHU-YU CHEN† , Institute of Computing Technology, CAS and University of Chinese Academy of Sciences
WANCHAO SU† , School of Creative Media, City University of Hong Kong
LIN GAO∗ , Institute of Computing Technology, CAS and University of Chinese Academy of Sciences
SHIHONG XIA, Institute of Computing Technology, CAS and University of Chinese Academy of Sciences
HONGBO FU, School of Creative Media, City University of Hong Kong
Fig. 1. Our DeepFaceDrawing system allows users with little training in drawing to produce high-quality face images (Bottom) from rough or even incomplete
freehand sketches (Top). Note that our method faithfully respects user intentions in input strokes, which serve more like soft constraints to guide image
synthesis.
Recent deep image-to-image translation techniques allow fast generation CCS Concepts: • Human-centered computing → Graphical user in-
of face images from freehand sketches. However, existing solutions tend to terfaces; • Computing methodologies → Perception; Texturing; Image
overfit to sketches, thus requiring professional sketches or even edge maps processing.
as input. To address this issue, our key idea is to implicitly model the shape
space of plausible face images and synthesize a face image in this space to Additional Key Words and Phrases: image-to-image translation, feature
approximate an input sketch. We take a local-to-global approach. We first embedding, sketch-based generation, face synthesis
learn feature embeddings of key face components, and push corresponding
parts of input sketches towards underlying component manifolds defined 1 INTRODUCTION
by the feature vectors of face component samples. We also propose another Creating realistic human face images from scratch benefits vari-
deep neural network to learn the mapping from the embedded component
ous applications including criminal investigation, character design,
features to realistic images with multi-channel feature maps as intermediate
educational training, etc. Due to their simplicity, conciseness and
results to improve the information flow. Our method essentially uses input
sketches as soft constraints and is thus able to produce high-quality face ease of use, sketches are often used to depict desired faces. The
images even from rough and/or incomplete sketches. Our tool is easy to recently proposed deep learning based image-to-image translation
use even for non-artists, while still supporting fine-grained control of shape techniques (e.g., [19, 38]) allow automatic generation of photo im-
details. Both qualitative and quantitative evaluations show the superior ages from sketches for various object categories including human
generation ability of our system to existing and alternative solutions. The faces, and lead to impressive results.
usability and expressiveness of our system are confirmed by a user study. Most of such deep learning based solutions (e.g., [6, 19, 26, 38]) for
sketch-to-image translation often take input sketches almost fixed
† Authors contributed equally. and attempt to infer the missing texture or shading information
∗ Corresponding author. between strokes. To some extent, their problems are formulated
Webpage: http://geometrylearning.com/DeepFaceDrawing/
This is the author’s version of the work. It is posted here for your personal use. Not for more like reconstruction problems with input sketches as hard con-
redistribution. straints. Since they often train their networks from pairs of real
2 • Shu-Yu Chen, Wanchao Su, Lin Gao, Shihong Xia, and Hongbo Fu
images and their corresponding edge maps, due to the data-driven 2.1 Drawing Assistance
nature, they thus require test sketches with quality similar to edge Multiple guidance or suggestive interfaces (e.g., [17]) have been
maps of real images to synthesize realistic face images. However, proposed to assist users in creating drawings of better quality. For
such sketches are difficult to make especially for users with little example, Dixon et al. [7] proposed iCanDraw, which provides correc-
training in drawing. tive feedbacks based on an input sketch and facial features extracted
To address this issue, our key idea is to implicitly learn a space from a reference image. ShadowDraw by Lee et al. [25] retrieves
of plausible face sketches from real face sketch images and find real images from an image repository involving many object cate-
the closest point in this space to approximate an input sketch. In gories for an input sketch as query and then blends the retrieved
this way, sketches can be used more like soft constraints to guide images as shadow for drawing guidance. Our shadow-guided inter-
image synthesis. Thus we can increase the plausibility of synthe- face for inputting sketches is based on the concept of ShadowDraw
sized images even for rough and/or incomplete input sketches while but specially designed for assisting in face drawing. Matsui et al. [29]
respecting the characteristics represented in the sketches (e.g., Fig- proposed DrawFromDrawings, which allows the retrieval of refer-
ure 1 (a-d)). Learning such a space globally (if exists) is not very ence sketches and their interpolation with an input sketch. Our
feasible due to the limited training data against an expected high- solution for projecting an input sketch to underlying component
dimensional feature space. This motivates us to implicitly model manifolds follows a similar retrieval-and-interpolation idea but we
component-level manifolds, which makes a better sense to assume perform this in the learned feature spaces, without explicit corre-
each component manifold is low-dimensional and locally linear [32]. spondence detection, as needed by DrawFromDrawings. Unlike the
This decision not only helps locally span such manifolds using a above works, which aim to produce quality sketches as output, our
limited amount of face data, but also enables finer-grained control work treats such sketches as possible inputs and we are more in-
of shape details (Figure 1 (e)). terested in producing realistic face images even from rough and/or
To this end we present a novel deep learning framework for incomplete sketches.
sketch-based face image synthesis, as illustrated in Figure 3. Our Another group of methods (e.g., [1, 18]) take a more aggressive
system consists of three main modules, namely, CE (Component way and aim to automatically correct input sketches. For exam-
Embedding), FM (Feature Mapping), and IS (Image Synthesis). The ple, Limpaecher et al. [27] learn a correction vector field from a
CE module adopts an auto-encoder architecture and separately crowdsourced set of face drawings to correct a face sketch, with the
learns five feature descriptors from the face sketch data, namely, assumption that such face drawings and the input sketch is for a
for “left-eye”, “right-eye”, “nose”, “mouth”, and “remainder” for lo- same subject. Xie et al. [41] and Su et al. [36] propose optimization-
cally spanning the component manifolds. The FM and IS modules based approaches for refining sketches roughly drawn on a refer-
together form another deep learning sub-network for conditional ence image. We refine an input sketch by projecting individual face
image generation, and map component feature vectors to realistic components of the input sketch to the corresponding component
images. Although FM looks similar to the decoding part of CE, by manifolds. However, as shown in Figure 5, directly using such re-
mapping the feature vectors to 32-channel feature maps instead fined component sketches as input to conditional image generation
of 1-channel sketches, it improves the information flow and thus might cause artifacts across facial components. Since our goal is
provides more flexibility to fuse individual face components for sketch-based image synthesis, we thus perform sketch refinement
higher-quality synthesis results. only implicitly.
Inspired by [25], we provide a shadow-guided interface (imple-
mented based on CE) for users to input face sketches with proper
structures more easily (Figure 8). Corresponding parts of input 2.2 Conditional Face Generation
sketches are projected to the underlying facial component manifolds In recent years, conditional generative models, in particular, con-
and then mapped to the corresponding feature maps for conditions ditional Generative Adversarial Networks [11] (GANs), have been
for image synthesis. Our system produces high-quality realistic popular for image generation conditioned on various input types.
face images (with resolution of 512 × 512), which faithfully respect Karras et al. [22] propose an alternative for the generator in GAN
input sketches. We evaluate our system by comparing with the exist- that separates the high level face attributes and stochastic varia-
ing and alternative solutions, both quantitatively and qualitatively. tions in generating high quality face images. Based on conditional
The results show that our method produces visually more pleas- GANs [30], Isola et al. [19] present the pix2pix framework for vari-
ing face images. The usability and expressiveness of our system ous image-and-image translation problems like image colorization,
are confirmed by a user study. We also propose several interesting semantic segmentation, sketch-to-image synthesis, etc. Wang et
applications using our method. al. [38] introduce pix2pixHD, an improved version of pix2pix to gen-
erate higher-resolution images, and demonstrate its application to
image synthesis from semantic label maps. Wang et al. [37] generate
an image given a semantic label map as well as an image exemplar.
2 RELATED WORK Sangkloy et al. [34] take hand-drawn sketches as input and col-
Our work is related to existing works for drawing assistance and orize them under the guidance of user-specified sparse color strokes.
conditional face generation. We focus on the works closely related These systems tend to overfit to conditions seen during training, and
to ours. A full review on such topics is beyond the scope of our thus when sketches being used as conditions, they achieve quality
paper. results only given edge maps as input. To address this issue, instead
DeepFaceDrawing: Deep Generation of Face Images from Sketches • 3
Fig. 3. Illustration of our network architecture. The upper half is the Component Embedding module. We learn feature embeddings of face components using
individual auto-encoders. The feature vectors of component samples are considered as the point samples of the underlying component manifolds and are used
to refine an input hand-drawn sketch by projecting its individual parts to the corresponding component manifolds. The lower half illustrates a sub-network
consisting of the Feature Mapping (FM) and the Image Synthesis (IS) modules. The FM module decodes the component feature vectors to the corresponding
multi-channel feature maps (𝐻 × 𝑊 × 32), which are combined according to the spatial locations of the corresponding facial components before passing them
to the IS module.
To achieve this, we first learn the feature embeddings of face (Section 3.2). This greatly improves the information flow and bene-
components (Section 3.2). For each component type, the points fits component fusion. Below we first discuss our data preparation
corresponding to component samples implicitly define a manifold. procedure (Section 3.1). We then introduce our novel pipeline for
However, we do not explicitly learn this manifold, since we are more sketch-to-image synthesis (Section 3.2), and our approach for man-
interested in knowing the closest point in such a manifold given a ifold projection (Section 3.3). Finally present our shadow-guided
new sketched face component, which needs to be refined. Observing interface (Section 3.4).
that in the embedding spaces semantically similar components are
close to each other, we assume that the underlying component 3.1 Data Preparation
manifolds are locally linear. We then follow the main idea of the To train our network, it requires a reasonably large-scale dataset of
classic locally linear embedding (LLE) algorithm [32] to project the face sketch-image pairs. There exist several relevant datasets like the
feature vector of the sketched face component to its component CUHK face sketch database [39, 46]. However, the sketches in such
manifold (Section 3.3). datasets involve shading effects while we expect a more abstract
The learned feature embeddings also allow us to guide conditional representation of faces using sparse lines. We thus contribute to a
sketch-to-image synthesis to explicitly exploit the information in the new dataset of pairs of face images and corresponding synthesized
feature space. Unlike traditional sketch-to-image synthesis methods sketches. We build this on the face image data of CelebAMask-HQ
(e.g., [19, 38]), which learn conditional GANs to translate sketches to [24], which contains high-resolution facial images with semantic
images, our approach forces the synthesis pipeline to go through the masks of facial attributes. For simplicity, we currently focus on front
component feature spaces and then map 1-channel feature vectors faces, without decorative accessories (e.g., glasses, face masks).
to 32-channel feature maps before the use of a conditional GAN To extract sparse lines from real images, we have tried the fol-
lowing edge detection methods. As shown in Figure 2 (b) and (d),
DeepFaceDrawing: Deep Generation of Face Images from Sketches • 5
Component Embedding Module. Since human faces share a clear Feature Mapping Module. Given an input sketch, we can project
structure, we decompose a face sketch into five components, denoted its individual parts to the component manifolds to increase its plau-
as 𝑆 𝑐 , 𝑐 ∈ {1, 2, 3, 4, 5} for “left-eye", “right-eye", “nose", “mouth", sibility (Section 3.3). One possible solution to synthesize a realistic
and “remainder", respectively. To handle the details in-between com- image is to first convert the feature vectors of the projected manifold
ponents, we define the first four components simply by using four points back to the component sketches using the learned decoders
overlapping windows centered at individual face components (de- {𝐷𝑐 }, then perform component-level sketch-to-image synthesis (e.g.,
rived from the pre-labeled segmentation masks in the dataset), as based on [38]), and finally fuse the component images together to
illustrated in Figure 3 (Top-Left). A “remainder” image correspond- get a complete face. However, this straightforward solution easily
ing to the “remainder” component is the same as the original sketch leads to inconsistencies in synthesized results in terms of both local
image but with the eyes, nose and mouth removed. Here we treat details and global styles, since there is no mechanism to coordinate
“left-eye” and “right-eye” separately to best explore the flexibility in the individual generation processes.
the generated faces (see two examples in Figure 4). To better control Another possible solution is to first fuse the decoded component
of the details of individual components, for each face component sketches into a complete face sketch (Figure 5 (b)) and then perform
type we learn a local feature embedding. We obtain the feature sketch-to-image synthesis to get a face image (Figure 5 (c)). It can be
descriptors of individual components by using five auto-encoder seen that this solution also easily causes artifacts (e.g., misalignment
networks, denoted as {𝐸𝑐 , 𝐷𝑐 } with 𝐸𝑐 being an encoder and 𝐷𝑐 a between face components, incompatible hair styles) in the synthe-
decoder for component 𝑐. sized sketch, and such artifacts are inherited to the synthesized
Each auto-encoder consists of five encoding layers and five decod- image, since existing deep learning solutions for sketch-to-image
ing layers. We add a fully connected layer in the middle to ensure synthesis tend to use input sketches as rather hard constraints, as
the latent descriptor is of 512 dimensions for all the five components. discussed previously.
6 • Shu-Yu Chen, Wanchao Su, Lin Gao, Shihong Xia, and Hongbo Fu
f sc
w2c c
w1c f proj
wkc
Mc Kc
Fig. 6. Illustration of manifold projection. Given a new feature vector 𝑓𝑠˜𝑐 , we
(a) (b) replace it with the projected feature vector 𝑓𝑝𝑟𝑜 𝑗 using K nearest neighbors
𝑐
of 𝑓𝑠˜𝑐 .
Fig. 9. Interpolating an input sketch and its refined version (for the “remain-
der” component in this example) after manifold projection under different
blending weight values. 𝑤𝑏𝑐 = 1 means a full use of an input sketch for
image synthesis, while by setting 𝑤𝑏𝑐 = 0 we fully trust the data for inter-
polation.
effects. This is solved by trusting the input sketch for its “remainder”
Full Refinement 𝑤𝑏 = 0.0 𝑤𝑏 1,2,4 = {0.7, 0.4, 0.3}
component by adjusting its corresponding blending weight. Figure
10 shows another example with different blending weights for differ- Fig. 10. Blending an input sketch and its refined version after manifold
ent components. It can be easily seen that the result with automatic projection for the “left-eye”, “right-eye”, and “mouth” components. Upper
refinement (lower left) is visually more realistic than that without Right: result without any sketch refinement; Lower Left: result with full-
any refinement (upper right). Fine-tuning of the blending weights degree sketch refinement; Lower Right: result with partial-degree sketch
leads to a result better reflecting the input sketch more faithfully. refinement.
4 EXPERIMENTS
We have done extensive evaluations to show the effectiveness of our 20 to 26) were invited to participate in this study. We first asked
sketch-to-image face synthesis system and its usability via a pilot them to self-assess their drawing skills through a nine-point Lik-
study. Below we present some of the obtained results. Please refer to ert scale (1: novice to 9: professional), and divided them into three
the supplemental materials for more results and an accompanying groups: 4 novice users (drawing skill score: 1 – 3), 4 middle users
video for sketch-based image synthesis in action. (4 – 6), and 2 professional users (7 – 9). Before the drawing session,
Figure 11 shows two representative results where users progres- each participant was given a short tutorial about our system (about
sively introduce new strokes to add or stress local details. As shown 10 minutes). The participants used an iPad with iPencil to remotely
in the demo video, running on a PC with an Intel i7-7700 CPU, 16GB control the server PC for drawing. Then each of them was asked
RAM and a single Nvidia GTX 1080Ti GPU, our method achieves to create at least 3 faces using our system. The study ended with a
real-time feedback. Thanks to our local-to-global approach, gen- questionnaire to get user feedbacks on ease-of-use, controllability,
erally more strokes lead to new or refined details (e.g., the nose variance of results, quality of results, and expectation fitness. The
in the first example, and the eyebrows and wrinkles in the second additional comments on our system were also welcome.
example), with other areas largely unchanged. Still due to the com- Figure 12 gives a gallery of sketches and synthesized faces by the
bination step, local editing might still introduce subtle but global participants. It can be seen that our system consistently produce
changes. For example, for the first example, the local change of realistic results given input sketches with different styles and levels
lighting in the nose area leads to the change of highlight in the of abstraction. For several examples, the participants attempted to
whole face (especially in the forehead region). Figure 18 shows two depict beard styles via hatching and our system captured the users’
more complete sequences of progressive sketching and synthesis, intention very well.
with our shadow-guided interface. Figure 13 shows a radar plot, summarizing quantitative feedbacks
on our system for participant groups with different levels of drawing
4.1 Usability Study skills. The feedbacks for all the groups of participants were positive
We conducted a usability study to evaluate the usefulness and effec- in all the measured aspects. Particularly, the participants with good
tiveness of our system. 10 subjects (9 male and 1 female, aged from drawing skills felt a high level of controllability, while they gave
DeepFaceDrawing: Deep Generation of Face Images from Sketches • 9
Fig. 12. Gallery of input sketches and synthesized results in the usability study.
DeepFaceDrawing: Deep Generation of Face Images from Sketches • 11
(a) Professional
Novice
Middle
(e) (b)
(d) (c)
Fig. 15. Two representative sets of input sketches and synthesized results
Fig. 13. The summary of quantitative feedback in the usability study. (a) used in the perceptive evaluation study. From left to right: input sketch, the
Ease of use. (b) Controllability. (c) Variance of results. (d) Quality of results. results by sketch refinement through global retrieval, local retrieval, and
(e) Expectation fitness. local retrieval with interpolation (our method).
5.5
5.8 5.8
5 5
(a) 5.6 5.6
Score
Score
5
Score
Score
5.4 5.4
Perception
Perception
4.5 4.5
Quality Perception
Quality Perception
5.2 4.5
5.2
5 5
4 4
4
Faithfulness
Faithfulness
(b) 4.8 4.8
4.6 4.6
3.5 3.5
3.5
Input Sketch 4.4 4.4
4.2 4.2
3 3
(c)
Global Local Interpolation Global Local Interpolation
Interpolation
Fig. 17. Comparisons with the state-of-the-art methods given the same input sketches (Top Row).
Lines2FacePhoto, following their original paper, we also convert each Figure 17 shows representative testing results given the same
sketch to a distance map as input for both training and testing. For sketches as input. It can be easily seen that our method produces
iSketchNFill, we train their shape completion module before feeding more realistic synthesized results. Since the input sketches are rough
it to pix2pix [19] (acting as the appearance synthesis module). The and/or incomplete, they are generally different from the training
input and output resolutions in their method are 256 × 256 and data, making the compared methods fail to produce realistic faces.
128 × 128, respectively. Although Lines2FacePhoto generates a relatively plausible result
DeepFaceDrawing: Deep Generation of Face Images from Sketches • 13
given an incomplete sketch, its ability to handle data imperfections involved network easy to train from a training dataset of not very
is rather limited. We attempted to perform quantitative evaluation as large scale. Our approach outperforms existing sketch-to-image
well. However, none of the assessment metrics we tried, including synthesis approaches, which often require edge maps or sketches
Fréchet Inception Distance [14] and Inception Score [33], could with similar quality as input. Our user study confirmed the usability
faithfully reflect visual perception. For example, the averaged values of our system. We also adapted our system for two applications:
of the Inception Score were 2.59 and 1.82 (the higher, the better) for face morphing and face copy-paste.
pix2pixHD and ours, respectively. However, it is easily noticeable Our current implementation considers individual components
that our results are visually better than those by pix2pixHD. rather independently. This provides flexibility (Figure 4) but also
introduces possible incompatibility problems. This issue is more
5 APPLICATIONS obvious for the eyes (Figure 21), which are often symmetric. This
Our system can be adapted for various applications. In this section might be addressed by introducing a symmetry loss [15] or explicitly
we present two applications: face morphing and face copy-paste. requiring two eyes from the same samples (similar to Figure 20).
Our work has focused on refining an input sketch component-by-
5.1 Face Morphing component. In other words our system is generally able to handle
errors within individual components, but is not designed to fix the
Traditional face morphing algorithms [2] often require a set of
errors in the layouts of components (Figure 21). To achieve proper
keypoint-level correspondence between two face images to guide
layouts, we resort to a shadow-guided interface. In the future, we are
semantic interpolation. We show a simple but effective morphing ap-
interested in modeling spatial relations between facial components
proach by 1) decomposing a pair of source and target face sketches in
and fixing input layout errors.
the training dataset into five components (Section 3.2); 2) encoding
Our system takes black-and-white rasterized sketches as input
the component sketches as feature vectors in the corresponding fea-
and currently does not provide any control of color or texture in
ture spaces; 3) performing linear interpolation between the source
synthesized results. In a continuous drawing session, small changes
and target feature vectors for the corresponding components; 4)
in sketches sometimes might cause abrupt color changes. This might
finally feeding the interpolated feature vectors to the FM and IS mod-
surprise users and is thus not desirable for usability. We believe
ule to get intermediate face images. Figure 19 shows examples of
this can be potentially addressed by introducing a color control
face morphing using our method. It can be seen that our method
mechanism in generation. For example, we might introduce color
leads to smoothly transforming results in identity, expression, and
constraints by either adding them in the input as additional hints
even highlight effects.
or appending them to the latent space as additional guidance. In
addition, adding color control is also beneficial for applications such
5.2 Face Copy-Paste
as face morphing and face copy-and-paste.
Traditional copy-paste methods (e.g., [9]) use seamless stitching Like other learning-based approaches, the performance of our
methods on colored images. However, there will be situations where system is also dependent on the amount of training data. Although
the hue of local areas is irrelevant. To address this issue, we recom- component-level manifolds of faces might be low dimensional, due
bine face components for composing new faces, which can maintain to the relatively high-dimensional space of our feature vectors, our
the consistency of the overall color and lighting. Specifically, it can limited data only provides very sparse sampling of the manifolds. In
be achieved by first encoding face component sketches (possibly the future we are interested in increasing the scale of our training
from different subjects) as feature vectors and then combining them data, and aim to model underlying component manifolds more
as new faces by using the FM and IS modules. This can be used accurately. This will also help our system to handle non-frontal
to either replace components of existing faces with corresponding faces, faces with accessories. It is also interesting to increase the
components from another source, or combining components from diversity of results by adding random noise to the input. Explicitly
multiple persons. Figure 20 presents several synthesized new faces learning such manifolds and providing intuitive exploration tools
by re-combining eyes, nose, mouth and the remainder region from in a 2D space would be also interesting to explore.
four source sketches. Our image synthesis sub-network is able to Our current system is specially designed for faces by making use
resolve the inconsistencies between face components from different of the fixed structure of faces. How to adapt our idea to support
sources in terms of both lighting and shape. the synthesis of objects of other categories is an interesting but
challenging problem.
6 CONCLUSION AND DISCUSSIONS
In this paper we have presented a novel deep learning framework
for synthesizing realistic face images from rough and/or incomplete ACKNOWLEDGMENTS
freehand sketches. We take a local-to-global approach by first de- This work was supported by Beijing Program for International
composing a sketched face into components, refining its individual S&T Cooperation Project (No. Z191100001619003), Royal Society
components by projecting them to component manifolds defined by Newton Advanced Fellowship (No. NAF\R2\192151), Youth Inno-
the existing component samples in the feature spaces, mapping the vation Promotion Association CAS, CCF-Tencent Open Fund and
refined feature vectors to the feature maps for spatial combination, Open Project Program of the National Laboratory of Pattern Recog-
and finally translating the combined feature maps to realistic im- nition (NLPR) (No. 201900055). Hongbo Fu was supported by an
ages. This approach naturally supports local editing and makes the unrestricted gift from Adobe and grants from the Research Grants
14 • Shu-Yu Chen, Wanchao Su, Lin Gao, Shihong Xia, and Hongbo Fu
Fig. 18. Two sequences of progressive sketching (under shadow guidance) and synthesis results.
Fig. 19. Examples of face morphing by interpolating the component-level feature vectors of two given face sketches (Leftmost and Rightmost are corresponding
synthesized images).
Council of the Hong Kong Special Administrative Region, China (No. REFERENCES
CityU 11212119, 11237116), City University of Hong Kong (No. SRG [1] James Arvo and Kevin Novins. 2000. Fluid Sketches: Continuous Recognition and
7005176), and the Centre for Applied Computing and Interactive Morphing of Simple Hand-Drawn Shapes. In Proceedings of the 13th annual ACM
symposium on User interface software and technology. ACM, 73–80.
Media (ACIM) of School of Creative Media, CityU. [2] Martin Bichsel. 1996. Automatic interpolation and recognition of face images by
morphing. In Proceedings of the Second International Conference on Automatic Face
DeepFaceDrawing: Deep Generation of Face Images from Sketches • 15
Fig. 20. In each set, we show color image (Left) of the source sketches (not shown here), a new face sketch (Middle) by directly recombining the cropped
source sketches in the image domain, and a new face (Right) synthesized by using our method with the recombined sketches of the cropped components
(eyes, nose, mouth, and remainder) as input.
[26] Yuhang Li, Xuejin Chen, Feng Wu, and Zheng-Jun Zha. 2019. LinesToFacePhoto: (CVPR). 1495–1504.
Face Photo Generation From Lines With Conditional Self-Attention Generative [38] Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan
Adversarial Networks. In Proceedings of the 27th ACM International Conference on Catanzaro. 2018. High-resolution image synthesis and semantic manipulation with
Multimedia. ACM, 2323–2331. conditional gans. In IEEE Conference on Computer Vision and Pattern Recognition
[27] Alex Limpaecher, Nicolas Feltman, Adrien Treuille, and Michael Cohen. 2013. (CVPR). IEEE, 8798–8807.
Real-time drawing assistance through crowdsourcing. ACM Trans. Graph. 32, 4, [39] Xiaogang Wang and Xiaoou Tang. 2008. Face photo-sketch synthesis and recogni-
Article 54 (2013), 8 pages. tion. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 11 (2008),
[28] Zongguang Lu, Yang Jing, and Qingshan Liu. 2017. Face image retrieval based 1955–1967.
on shape and texture feature fusion. Computational Visual Media 3, 4 (12 2017), [40] Di Wu and Qionghai Dai. 2009. Sketch realizing: lifelike portrait synthesis from
359–368. https://doi.org/10.1007/s41095-017-0091-7 sketch. In Proceedings of the 2009 Computer Graphics International Conference.
[29] Yusuke Matsui, Takaaki Shiratori, and Kiyoharu Aizawa. 2016. DrawFromDraw- ACM, 13–20.
ings: 2D drawing assistance via stroke interpolation with a sketch database. IEEE [41] Jun Xie, Aaron Hertzmann, Wilmot Li, and Holger Winnemöller. 2014. PortraitS-
Transactions on Visualization and Computer Graphics 23, 7 (2016), 1852–1862. ketch: face sketching assistance for novices. In Proceedings of the 27th annual
[30] Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. ACM symposium on User interface software and technology. ACM, 407–417.
arXiv preprint arXiv:1411.1784 (2014). [42] Saining Xie and Zhuowen Tu. 2015. Holistically-Nested Edge Detection. In IEEE
[31] Tiziano Portenier, Qiyang Hu, Attila Szabo, Siavash Arjomand Bigdeli, Paolo International Conference on Computer Vision (ICCV). IEEE, 1395–1403.
Favaro, and Matthias Zwicker. 2018. Faceshop: Deep sketch-based face image [43] Ran Yi, Yong-Jin Liu, Yu-Kun Lai, and Paul L Rosin. 2019. APDrawingGAN:
editing. ACM Trans. Graph. 37, 4, Article 99 (2018), 13 pages. Generating Artistic Portrait Drawings from Face Photos with Hierarchical GANs.
[32] Sam T Roweis and Lawrence K Saul. 2000. Nonlinear dimensionality reduction In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE,
by locally linear embedding. Science 290, 5500 (2000), 2323–2326. 10743–10752.
[33] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, [44] Zili Yi, Hao Zhang, Ping Tan, and Minglun Gong. 2017. Dualgan: Unsupervised
and Xi Chen. 2016. Improved techniques for training gans. In Advances in neural dual learning for image-to-image translation. In IEEE International Conference on
information processing systems. Curran Associates, Inc., 2234–2242. Computer Vision (ICCV). 2849–2857.
[34] Patsorn Sangkloy, Jingwan Lu, Chen Fang, Fisher Yu, and James Hays. 2017. Scrib- [45] Sheng You, Ning You, and Minxue Pan. 2019. PI-REC: Progressive Image Recon-
bler: Controlling deep image synthesis with sketch and color. In IEEE Conference struction Network With Edge and Color Domain. arXiv preprint arXiv:1903.10146
on Computer Vision and Pattern Recognition (CVPR). IEEE, 5400–5409. (2019).
[35] Edgar Simo-Serra, Satoshi Iizuka, Kazuma Sasaki, and Hiroshi Ishikawa. 2016. [46] Wei Zhang, Xiaogang Wang, and Xiaoou Tang. 2011. Coupled information-
Learning to Simplify: Fully Convolutional Networks for Rough Sketch Cleanup. theoretic encoding for face photo-sketch recognition. In IEEE Conference on Com-
ACM Trans. Graph. 35, 4, Article 121 (2016), 11 pages. puter Vision and Pattern Recognition (CVPR). IEEE, 513–520.
[36] Qingkun Su, Wing Ho Andy Li, Jue Wang, and Hongbo Fu. 2014. EZ-sketching: [47] Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, and Alexei A. Efros. 2016.
three-level optimization for error-tolerant image tracing. ACM Trans. Graph. 33, Generative Visual Manipulation on the Natural Image Manifold. In European
4, Article 54 (2014), 9 pages. Conference on Computer Vision (ECCV).
[37] Miao Wang, Guo-Ye Yang, Ruilong Li, Run-Ze Liang, Song-Hai Zhang, Peter M [48] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired
Hall, and Shi-Min Hu. 2019. Example-guided style-consistent image synthesis from image-to-image translation using cycle-consistent adversarial networks. In IEEE
semantic labeling. In IEEE Conference on Computer Vision and Pattern Recognition International Conference on Computer Vision (ICCV). 2223–2232.