Tutorial on Keras
CAP 6412 - ADVANCED COMPUTER VISION
            SPRING 2018
         KISHAN S ATHREY
              Deep learning packages
             • TensorFlow – Google
             • PyTorch – Facebook AI research
             • Keras – Francois Chollet (now at
               Google)
             • Chainer – Company in Japan
             • Caffe - Berkeley Vision and Learning
               Center
             • CNTK - Microsoft
https://www.slideshare.net/0xdata/deep‐learning‐with‐mxnet‐dmitry‐larko
Overview of the tutorial
 • What is Keras ?
 • Basics of Keras environment
 • Building Convolutional neural networks
 • Building Recurrent neural networks
 • Introduction to other types of layers
 • Introduction to Loss functions and Optimizers in Keras
 • Using Pre-trained models in Keras
 • Saving and loading weights and models
 • Popular architectures in Deep Learning
What is Keras ?
• Deep neural network library in Python
 • High-level neural networks API
 • Modular – Building model is just stacking layers and connecting computational
   graphs
 • Runs on top of either TensorFlow or Theano or CNTK
• Why use Keras ?
 • Useful for fast prototyping, ignoring the details of implementing backprop or
   writing optimization procedure
 • Supports Convolution, Recurrent layer and combination of both.
 • Runs seamlessly on CPU and GPU
 • Almost any architecture can be designed using this framework
 • Open Source code – Large community support
            Working principle - Backend
             • Computational Graphs
                • Expressing complex expressions as
                  a combination of simple operations
                • Useful for calculating derivatives
                  during backpropagation
                • Easier to implement distributed
                  computation
                • Just specify the inputs, outputs and
                  make sure the graph is connected       e = c*d
                                                         where, “c = a+b” and “d = b+1”
                                                         So, e = (a+b)*(b+1)
                                                         Here “a” ,“b” are inputs
http://colah.github.io/posts/2015‐08‐Backprop/
General pipeline for implementing an ANN
• Design and define the neural network architecture
• Select the optimizer that performs optimization (gradient descent)
• Select the loss function and train it
• Select the appropriate evaluation metric for the given problem
     Implementing a neural network in Keras
      • Five major steps
       • Preparing the input and specify the input dimension (size)
       • Define the model architecture and build the computational graph
       • Specify the optimizer and configure the learning process
       • Specify the Inputs, Outputs of the computational graph (model) and the Loss function
       • Train and test the model on the dataset
      Note: Gradient calculations are taken care by Auto – Differentiation and parameter updates are done
      automatically in the backend
                          Define the
Prepare Input            ANN model                Optimizers             Loss function             Train and
(Images, videos,          (Sequential or         (SGD, RMSprop,            (MSE, Cross            evaluate the
  text, audio)           Functional style)           Adam)                entropy, Hinge)            model
                          (MLP, CNN, RNN)
Procedure to implement an ANN in Keras
• Importing Sequential class from keras.models
• Stacking layers using .add() method
• Configure learning process using .compile() method
• Train the model on train dataset using .fit() method
               Keras models – Sequential
                 • Sequential model
                 • Linear stack of layers
                 • Useful for building simple models
                    • Simple classification network
                    • Encoder – Decoder models
[1] https://blog.heuritech.com/2016/02/29/a‐brief‐report‐of‐the‐heuritech‐deep‐learning‐meetup‐5/vgg16/ 
[2] https://www.cc.gatech.edu/~hays/7476/projects/Avery_Wenchen/
                Keras models – Functional
                  • Functional Model
                     • Multi – input and Multi –
                       output models
                     • Complex models which forks
                       into 2 or more branches
                     • Models with shared (Weights)
                       layers
[1] https://www.sciencedirect.com/science/article/pii/S0263224117304517
[2] Unsupervised Domain Adaptation by Backpropagation, https://arxiv.org/abs/1409.7495
                Keras models – Functional
                (Domain Adaption)
                                                                                         •   Train on Domain A and Test on Domain B
                                                                                         •   Results in poor performance on test set
                                                                                         •   The data are from different domains
                                                                                         •   Solution: Adapt the model to both the domains
                         Domain A                          Domain B
                        With Labels                     Without Labels 
[1] https://www.sciencedirect.com/science/article/pii/S0263224117304517
[2] Unsupervised Domain Adaptation by Backpropagation, https://arxiv.org/abs/1409.7495
Convolution neural network - Sequential model
• Mini VGG style network                •   Height – height of the image
                                        •   Width – Width of the image
• FC – Fully Connected                                                             Input 
                                                                                  4D array
                                        •   channels – Number of channels
  layers (dense layer)                                                           Conv ‐ 32
                                        •   For RGB image, channels = 3
• Input dimension – 4D                  •   For gray scale image, channels = 1   Conv ‐ 32
 • [N_Train, height, width, channels]                                            Maxpool
 • N_train – Number of train                                                     Conv ‐ 64
   samples                                                                       Conv ‐ 64
                                                                                 Maxpool
                                                                                 FC ‐ 256
                                                                                  FC ‐ 10
  Input 
 4D array
Conv ‐ 32
Conv ‐ 32
Maxpool
Conv ‐ 64
Conv ‐ 64
Maxpool
FC ‐ 256
 FC ‐ 10
Simple MLP network - Functional model
• Import class called “Model”
• Each layer explicitly
  returns a tensor
• Pass the returned tensor to
  the next layer as input
• Explicitly mention model
  inputs and outputs
            Recurrent Neural Networks
            • RNNs are used on sequential data –
              Text, Audio, Genomes etc.
            • Recurrent networks are of three types
               • Vanilla RNN
               • LSTM
               • GRU
            • They are feedforward networks with
              internal feedback
            • The output at time “t” is dependent on
              current input and previous values
https://towardsdatascience.com/sentiment‐analysis‐using‐rnns‐lstm‐60871fa6aeba
Recurrent Neural Network
         Dense
Convolution layers
• 1D Conv
 keras.layers.convolutional.Conv1D(filters, kernel_size, strides=1, padding='valid', dilation_rate=1, 
 activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, 
 bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)
 Applications: Audio signal processing, Natural language processing
• 2D Conv
 keras.layers.convolutional.Conv2D(filters, kernel_size, strides=(1, 1), padding='valid', data_format=None, 
 dilation_rate=(1, 1), activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', 
 kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, 
 bias_constraint=None)
 Applications: Computer vision ‐ Images
• 3D Conv 
 keras.layers.convolutional.Conv3D(filters, kernel_size, strides=(1, 1, 1), padding='valid', data_format=None, 
 dilation_rate=(1, 1, 1), activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', 
 kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, 
 bias_constraint=None)
 Applications: Computer vision – Videos (Convolution along temporal dimension)
Pooling layers
• Max pool
keras.layers.pooling.MaxPooling2D(pool_size=(2, 2), strides=None, padding='valid’) 
• Average pool
keras.layers.pooling.AveragePooling2D(pool_size=(2, 2), strides=None, padding='valid')    Up sampling
• Up sampling
keras.layers.convolutional.UpSampling2D(size=(2, 2)) 
General layers
• Dense
keras.layers.core.Dense(units, activation=None, use_bias=True, 
kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, 
bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, 
bias_constraint=None)
• Dropout
keras.layers.core.Dropout(rate, noise_shape=None, seed=None)
• Embedding
keras.layers.embeddings.Embedding(input_dim, output_dim, input_length=None 
embeddings_initializer='uniform', embeddings_regularizer=None, activity_regularizer=None, 
embeddings_constraint=None, mask_zero=False)
Optimizers available in Keras
• How do we find the “best set of parameters (weights and biases)” for the
  given network ?
• Optimization
 •   They vary in the speed of convergence, ability to avoid getting stuck in local minima
 •   SGD – Stochastic gradient descent
 •   SGD with momentum
 •   Adam
 •   AdaGrad
 •   RMSprop
 •   AdaDelta
• Detailed explanation of each optimizer is given in the “Deep learning book”
 • URL: http://www.deeplearningbook.org/contents/optimization.html
Loss functions available in Keras
• MSE – Mean square error     • Categorical cross entropy – “K”
                                number of classes
• MAE – Mean absolute error
                              • KL divergence – If P(X) and Q(X)
                                are two different probability
                                distributions, then we can
                                measure how different these two
                                distributions are using KL
                                divergence
Loading and Saving Keras models
• Use .save method to save the
  model
• Use load_model function to
  load saved model
• Saved file contains –
   • Architecture of the model
   • Weights and biases
   • State of the optimizer
• Saving weights
• Loading all the weights and
  loading weights layer wise
Extracting features from pre-trained models
 • Import the network [eg:VGG16]
 • Specify the weights
 • Specify whether the classifier at
   the top has to be included or not
 • The argument “include_top =
   False” – removes the classifier
   from the imported model
 • The input size of the image must
   be same as what the imported
   model was trained on (with
   exceptions)
Popular Deep learning Architectures
• Popular Convolution networks
 •   Alex net
 •   VGG
 •   Res-Net
 •   DenseNet
• Generative models
 • Autoencoders
 • Generative adversarial networks
                 Image recognition networks
                   • AlexNet – 2012
                   • VGG - 2014
[1] AlexNet, https://papers.nips.cc/paper/4824‐imagenet‐classification‐with‐deep‐convolutional‐neural‐networks.pdf
[2] VGG Net, https://arxiv.org/pdf/1409.1556.pdf
           Image recognition networks
             • ResNet – 2015 (residual connections)
             • DenseNet – 2017 (Dense connectivity)
[1] ResNet, https://arxiv.org/abs/1512.03385
[2] DenseNet, https://arxiv.org/abs/1608.06993
Performance of the recognition networks
            Autoencoders
                        Output
                                                                     • Unsupervised representation learning
                                                                     • Dimensionality reduction
                                                                     • Denoising
                       Input
https://www.researchgate.net/figure/Figure‐9‐A‐autoencoder‐with‐many‐hidden‐layers‐two‐stacked‐autoencoders_282997080_fig9
             Generative Adversarial Network
https://indico.lal.in2p3.fr/event/3487/?view=standard_inline_minutes
        Interesting Applications using GANs
        • Generate images from
          textual description
        • Performing arithmetic
          in latent space
[1] Stack GAN, https://arxiv.org/abs/1612.03242
[2] DC GAN, https://arxiv.org/abs/1511.06434
                                                   Interesting Applications
                                                   using GANs
                                                   • Generate images of the same scene with different
                                                     weather conditions
                                                   • Transfer the style of painting from one image to other
                                                   • Change the content in the image
[1] UNIT, https://arxiv.org/pdf/1703.00848
[2] Cyclic GAN, https://arxiv.org/abs/1703.10593
Community contributed layers and other
functionalities
https://github.com/farizrahman4u/keras‐contrib/tree/master/keras_contrib
https://github.com/fchollet/keras/tree/master/keras/layers
Keras Documentation – keras.io
Keras Blog ‐ https://blog.keras.io/index.html
Questions ?