Generate Dog Images with
Generative Adversarial Networks (GAN)
Machine Learning II Project
Group 1: Gaofeng Huang, Jun Ying, Xi Zhang
Outline
Introduction of GAN
Data Description
Model Description
Experimental Setup
Results
Potential Improvement
Introduction of GAN
Application of GAN in the Real World
Shakespearean poetry Random Music
Snap Chat Babyface Filter
generator generator
Introduction of GAN
Background
GAN is a very new stuff
and has a promising
future. Especially in last
year (2018), GAN was
developed with an
exponential increment. In
other words, it is almost
an infant technology.
NN
Problem Definition
& Research Target
We are very curious
about how our familiar
neural networks apply
in a new structure.
That is our research
motivation.
Data Source
Data Description
Dog Breed Number of
Observations
0 n02085620-Chihuahua 152
1 n02085782- 185
Japanese_spaniel
2 n02085936-Maltese_dog 252
3 n02086079-Pekinese 149
4 n02086240-Shih-Tzu 214
. . .
. . .
. . .
116 n02113978- 155
Mexican_hairless
117 n02115641-dingo 156
118 n02115913-dhole 150
119 n02116738- 169
African_hunting_dog
Description of Models - The Basic Concept
Structure of GAN
Input random vectors into generator to generate fake image. Discriminator is responsible to classify the fake and real image.
As the discriminator becomes stricter, the generator must generate more realistic image to cheat the discriminator.
Description of Models – Adversarial Relationship
≠
!!!
?
Description of Models – Discriminator Training
● Initial discriminator. ● Label the real images as 1, and generated
images as 0 .
● Input the random vectors into the fixed
generator to get the generated images. ● Update the parameter of discriminator like
classifier training.
● Sample from the database.
Description of Models – Generator Training
● Fix the trained discriminator. ● Force the generator to generate images
that can be graded close to 1.
● Generate image.
● Like the optimizer in our familiar neural
● Get the score from trained discriminator. network but gradient ascent process.
Description of Models – Generator Training
When we put these two processes together, we got a whole big network. If we
extract the output from the hidden layer in the middle, we get a complete
image. Details will be shown in the model implement section.
Description of Models – Math Concept
Discriminator parameter: 𝜃+ Sample from database: {𝑥 # , 𝑥 % , … , 𝑥 ' }
Noise samples: 𝑧# , 𝑧 % , … , 𝑧 '
Generator parameter : 𝜃, Obtain generated data {{𝑥- # , 𝑥- % , … , 𝑥- ' }}, 𝑥- . = G(𝑧 . ).
Update discriminator parameters 𝜽𝒅 : Update generator parameters 𝜽𝒈 :
# # #
𝑉1 = ' ∑'.4# 𝑙𝑜𝑔𝐷 𝑥 . + ∑' 𝑙𝑜𝑔 1 − 𝐷(𝑥- . ) 𝑉1 = ' ∑'.4# log(𝐷 𝐺(𝑧 . ) )
' .4#
1 +) 1 ,)
𝜃, ← 𝜃, + 𝜂 𝑉(𝜃
𝜃+ ← 𝜃+ + 𝜂 𝑉(𝜃 * 𝜂: 𝐿𝑒𝑎𝑟𝑛𝑖𝑛𝑔 𝑅𝑎𝑡𝑒
Assent gradient to update the parameter of Assent gradient to update the parameter
discriminator. of generator .
◦ Generator (13.28m parameters)
Simple GAN with MLP Input noise vector z ∈ ℝ𝟏𝟎𝟎
Dense, 256 + BN + LReLU
◦ To construct a GAN Dense, 512 + BN + LReLU
◦ Generator à Upsampling Dense, 1024 + BN + LReLU
◦ Discriminator à Downsampling Dense + Reshape, 64 x 64 x 3 + Tanh
◦ Upsampling Output fake images ∈ ℝ𝟔𝟒×𝟔𝟒×𝟑
◦ MLP: Increasing number of neurons ◦ Discriminator (6.46m parameters)
◦ Downsampling Input RGB image ∈ ℝ𝟔𝟒×𝟔𝟒×𝟑
◦ MLP: Diminishing number of neurons Flatten + Dense, 512 + LReLU
Dense, 256 + LReLU
Dense, 128 + LReLU
Dense, 1 + Sigmoid
Output probability ∈ ℝ𝟏
Simple GAN with MLP
◦ After 1000 epochs ◦ After 3000 epochs ◦ After 10000 epochs
Simple GAN with MLP: Problems
Imbalance between Generator and Discriminator. ◦ After 10000 epochs
◦ Even with larger epochs, the generated images are
still blurred. à Stop learning anymore.
◦ Generator always cannot compete with Discriminator.
◦ Intuitively, creativity is more difficult than criticism. In fact,
it tends to be easy to distinguish an artwork is real or
fake. However, without seeing the real artwork, it is really
hard to create a fake artwork which looks just like the real
one.
◦ Mathematically, the gradient descent of Generator will
be vanishing.
Simple GAN with MLP: Problems
Imbalance between Generator and Discriminator. ◦ After 10000 epochs
◦ Even with larger epochs, the generated images are
still blurred. à Stop learning anymore.
◦ Generator always cannot compete with Discriminator.
◦ Intuitively, creativity is more difficult than criticism. In fact,
it tends to be easy to distinguish an artwork is real or
fake. However, without seeing the real artwork, it is really
hard to create a fake artwork which looks just like the real
one.
◦ Mathematically, the gradient descent of Generator will
be vanishing.
Potential solutions
◦ A trivial approach is adjusting training times of Generator and Discriminator separately.
◦ In practice, it helps a little bit, but it also makes the training process more unstable.
◦ MLP based Generator cannot focus on the detail features of an image. à Create a
deeper Generator structure with convolution layers.
◦ Deep Convolution GAN (DCGAN)
◦ Some other approaches
◦ spectral normalization.
◦ finding a loss function and an activation function with stable and non-vanishing gradients.
DCGAN - Upsampling & Downsampling
◦ To construct a DCGAN
◦ Generator à Upsampling
◦ Discriminator à Downsampling
◦ Downsampling
◦ MLP: Diminishing number of neurons
◦ Convolution (with stride)
DCGAN - Upsampling & Downsampling
◦ Upsampling
◦ MLP: Increasing number of neurons
◦ Transposed convolution (with stride)
◦ Pretty same as convolution
◦ Stride concept is transposed
◦ Padding concept is transposed
◦ E.g. If Conv2D(stride=(2, 2), padding=‘same’) will
reduce a size from 6x6 to 3x3.
TransposedConv2D(stride=(2, 2), padding=‘same’)
will increase a size from 3x3 to 6x6.
Stride(2, 2), Padding=1 Stride(1, 1), Padding = 0
DCGAN
Construction
◦ Upsampling
◦ Transposed convolution
◦ Downsampling
◦ Replace all max pooling
with convolutional stride.
◦ Activation
◦ LeakyReLU
DCGAN - Final architecture
◦ Generator (1.28m parameters) ◦ Discriminator (1.18m parameters)
Input noise vector z ∈ ℝ𝟏𝟎𝟎 Out size Input RGB image ∈ ℝ𝟔𝟒×𝟔𝟒×𝟑 Out size
Dense + Reshape, 4 x 4 x 128 + LReLU 4 x 4 x 128 3 x 3 Conv + Stride(2, 2), 128 + LReLU 32 x 32 x 128
4 x 4 TranspConv + Stride(2, 2), 128 + LReLU 8 x 8 x 128 3 x 3 Conv + Stride(2, 2), 128 + LReLU 16 x 16 x 128
4 x 4 TranspConv + Stride(2, 2), 128 + LReLU 16 x 16 x 128 3 x 3 Conv + Stride(2, 2), 128 + LReLU 8 x 8 x 128
4 x 4 TranspConv + Stride(2, 2), 128 + LReLU 32 x 32 x 128 3 x 3 Conv + Stride(2, 2), 128 + LReLU 4 x 4 x 128
4 x 4 TranspConv + Stride(2, 2), 128 + LReLU 64 x 64 x 128 3 x 3 Conv + Stride(2, 2), 128 + LReLU 2 x 2 x 128
8 x 8 Conv, 64 x 64 x 3 + Tanh 64 x 64 x 3 Flatten + Dropout 512
Output fake images ∈ ℝ𝟔𝟒×𝟔𝟒×𝟑 Dense, 1 + Sigmoid 1
Output probability ∈ ℝ𝟏
Hard to tune
◦ High sensitivity to hyperparameter. Even the performance of GANs varies with different
random seeds. Tuning a GAN should be very patient.
◦ Solutions. Looking at the gradients and loss changes is the most efficient way to help
with tuning. After that, the only thing we should keep is our patience.
DCGAN - Not well-tuned
◦ CNN creates blocks in images
◦ Comparing with MLP structure, a well-tuned deep
convolution structure helps the networks recognize
the more details inside an image. However, if the
convolution structure is not tuned well, the result will
be worse than simple GAN with MLP. With the
operation of convolution, the generated images may
be cut in blocks.
◦ Fail to learn to generate dog-like
images
DCGAN - Not well-tuned
◦ Even if the blocks problem is mitigated, DCGAN may
fail to generate target-like images. It means both of
Discriminator and Generator are learning in the wrong
directions. ◦ Model collapse
◦ Most generated images look similar. à Mode collapse
◦ Although the image quality is improved in some cases,
the mode collapse problem is still existed in DCGAN.
One trick makes a great improvement
◦ Using soft and noisy labels can help GANs to be stable a lot. Without such changes, our model
cannot create a clear image. Smoothing the positive labels (like 0.9-1.0).
◦ Soft and noisy labels make a balance between Discriminator and Generator in this competition.
Let Discriminator not be over-confident and let Generator not be under-confident.
◦ To be noticed, we should only smooth the one-sided label, particularly the positive labels.
(Goodfellow, 2016)
One trick makes a great improvement
◦ Using soft and noisy labels can help GANs to be stable a lot. Without such changes, our model
cannot create a clear image. Smoothing the positive labels (like 0.9-1.0).
◦ Soft and noisy labels make a balance between Discriminator and Generator in this competition.
Let Discriminator not be over-confident and let Generator not be under-confident.
◦ To be noticed, we should only smooth the one-sided label, particularly the positive labels.
(Goodfellow, 2016)
DCGAN - Performance
◦ In the early training step, it can generate ◦ Some images generated look like a dog,
some clear image but not dog-like. but most of them are still not dog-like.
◦ After 5000 epochs ◦ After 10000 epochs
DCGAN - Performance
◦ the improvement is not significant. This
◦ It is not easy to generate clear dog-like DCGAN model almost achieves its
images without mode collapse. maximum performance.
◦ After 40000 epochs ◦ After 60000 epochs
DCGAN
◦ A glance of the random
generated dogs.
Mode collapse
◦ Issue: Generator only produces low-diversity outputs. A complete mode collapse, which is
not common, means the Generator just makes a trick to create only one type of image to
fool the Discriminator. A partial mode collapse, which always happens, is a hard problem in
GAN to solve.
◦ Solutions: Mode collapse is still a difficult problem in most GANs. Nevertheless, there are
some tricky ways to disperse kind of collapse like Conditional GAN (CGAN). CGAN inputs
the label of one or more real data into the model as a condition, so that the model is
affected by the label. Here, our dog dataset contains 120 kinds of dogs. Therefore, we try to
use CGAN to solve the problem of mode collapse.
CGAN
◦ Conditional Generative Adversarial Networks (CGAN), an extension of the GAN, allows you
to generate images with specific conditions or attributes.
◦ Difference: both the generator and the discriminator of CGAN receive some extra
conditional information, such as the class of the image, a graph, some words, or a
sentence.
◦ The cost function for CGAN is the same as GAN
𝑚𝑖𝑛V 𝑚𝑎𝑥W 𝑉 𝐺, 𝐷 = 𝔼Z log 𝐷 𝑥, 𝑦 + 𝔼\ log 1 − 𝐷 𝐺 𝑧, 𝑦\ , 𝑦\
◦ CGAN can make generator generate different types of images, which will prevent
generator from generating similar images after multiple trainings.
◦ We can control the generator to generate an image which will have some properties we
want.
CGAN
Construction
● Upsampling
○ Combine vector
z with label y
● Downsampling
○ Expand and
reshape label y,
then combine
with image
● Activation
○ LeakyReLU
Description of Models – Discriminator Training
Description of Models – Generator Training
● Fix the trained discriminator. ● Force the generator to generate images
that can be graded close to 1.
● Generate image.
● Like the optimizer in our familiar neural
● Get the score from trained discriminator. network but gradient ascent process.
CGAN - Final architecture
● Generator
Input noise vector z ∈ ℝ𝟏𝟎𝟎 Out size Input label (0-119) Out size
Dense + Reshape,4 x 4 x 256 + LReLU 4 x 4 x 256 Embedding + Dense + Reshape, 4 x 4 4x4x1
Merge input noise vector and input label Out size
Concatenate 4 x 4 x 257
4 x 4 TranspConv + Stride(2, 2), 128 + LReLU 8 x 8 x 128
4 x 4 TranspConv + Stride(2, 2), 128 + LReLU 16 x 16 x 128
4 x 4 TranspConv + Stride(2, 2), 128 + LReLU 32 x 32 x 128
4 x 4 TranspConv + Stride(2, 2), 128 + LReLU 64 x 64 x 128
8 x 8 Conv, 64 x 64 x 3 + Tanh 64 x 64 x 3
Output fake images ∈ ℝ𝟔𝟒×𝟔𝟒×𝟑
CGAN - Final architecture
● Discriminator
Input RGB image ∈ ℝ𝟔𝟒×𝟔𝟒×𝟑 Out size Input label (0-119) Out size
- 64 x 64 x 3 Embedding + Dense + Reshape, 64 x 64 64 x 64 x 1
Merge input RGB image and input label Out size
Concatenate 64 x 64 x 4
3 x 3 Conv + Stride(2, 2), 128 + LReLU 32 x 32 x 128
3 x 3 Conv + Stride(2, 2), 128 + LReLU 16 x 16 x 128
3 x 3 Conv + Stride(2, 2), 128 + LReLU 8 x 8 x 128
3 x 3 Conv + Stride(2, 2), 128 + LReLU 4 x 4 x 128
3 x 3 Conv + Stride(2, 2), 128 + LReLU 2 x 2 x 128
Flatten + Dropout 512
Dense, 1 + Sigmoid 1
Output probability ∈ ℝ𝟏
CGAN - Performance
● In the early training
step, it can generate
some clear image
but not dog-like and
no type.
● After 6,000 epochs
CGAN - Performance
● Although still a bunch
of things, some have
the shape of a dog.
In some columns,
there are similar
types.
● After 15,000 epochs
CGAN - Performance
● Most of images look
like a dog. Each
column roughly has
its own type.
● After 30,000 epochs
CGAN - Performance
● There is mode
collapse of some
column.
● After 45,000 epochs
CGAN - Performance
● At this point, the
generator cannot
further generate a
more dog-like image.
● There is mode
collapse of each
column.
● At least, we can get
120 generated dogs
in CGAN.
● After 60,000 epochs
CGAN
A glance of the
random generated
dogs.
Future Improvement of CGAN
● Input conditional data
combine with images which
via convolution into a dense
code.
● The combined data yield a
prediction of each class
possibility.
● Avinash H. (2017). The GAN Zoo. GitHub.
https://github.com/hindupuravinash/the-gan-zoo
● Amir J.(2019). Deep-Learning. GitHub.
https://github.com/amir-jafari/Deep-Learning
● Hongyi, L. (2018). GAN Lecture 1: Introduction.
References
YouTube.
https://www.youtube.com/watch?v=DQNNMiAP5l
w&list=PLJV_el3uVTsMq6JEFPW35BCiOQTsoq
wNw&index=1
● Kaggle Competition. (2019). Generative Dog
Images. Kaggle.
https://www.kaggle.com/c/generative-dog-images
Q&A