Convolution Neural Networks (CNN) : Ms. Anisha Mahato Assistant Professor (CSE Specialization)
Convolution Neural Networks (CNN) : Ms. Anisha Mahato Assistant Professor (CSE Specialization)
(CNN)
Cat? (0/1)
64x64
  Object detection
Deep Learning on large images
                                            Problems
                                            1. Too many parameters to train
                                            2. Positional information is lost
                             Cat? (0/1)     3. Chance of overfitting
64 x 64 x 3
                     1000 x 1000 x 3
                     = 3 million
   1000 x 1000 x 3
                                          3 million   x   1000 = 3 billion trainable weights
Edge Detection
vertical edges
                 horizontal edges
Vertical edge detection
       3x1 + 0x0 + 1x-1 + 1x1 + 5x0 + 8x-1 + 2x1 + 7x0 + 2x-1 = -5
   1     0    -1
 3     0     1     2   7   4         Filter / Kernel
   1     0    -1
 1     5     8     9   3   1          1      0   -1
   1     0    -1
 2     7     2     5   1   3
                                 *    1      0   -1    =
 0     1     3     1   7   8
 4     2     1     6   2   8          1      0   -1
 2     4     5     2   3   9              3x3
                                                               4x4
             6x6               Convolution
Vertical edge detection
       3x1 + 0x0 + 1x-1 + 1x1 + 5x0 + 8x-1 + 2x1 + 7x0 + 2x-1 = -5
   1     0    -1
 3     0     1     2   7   4         Filter / Kernel
 1
   1
       5
         0
             8
              -1
                   9   3   1                               -5
                                      1      0   -1
   1     0    -1
 2     7     2     5   1   3
                                 *    1      0   -1    =
 0     1     3     1   7   8
 4     2     1     6   2   8          1      0   -1
 2     4     5     2   3   9              3x3
                                                                4x4
             6x6               Convolution
Vertical edge detection
       1     0    -1
 3   0     1     2     7   4
 1   5
       1
           8
             0
                 9
                  -1
                       3   1                          -5   -4
       1     0    -1
                                   1    0    -1
 2   7     2     5     1   3
                               *   1    0    -1   =
 0   1     3     1     7   8
 4   2     1     6     2   8       1    0    -1
 2   4     5     2     3   9           3x3
                                                            4x4
           6x6
Vertical edge detection
           1     0    -1
 3   0   1     2     7     4
 1   5   8
           1
               9
                 0
                     3
                      -1
                           1                          -5   -4   0
                                   1    0    -1
           1     0    -1
 2   7   2     5     1     3
                               *   1    0    -1   =
 0   1   3     1     7     8
 4   2   1     6     2     8       1    0    -1
 2   4   5     2     3     9           3x3
                                                            4x4
         6x6
Vertical edge detection
               1     0    -1
 3   0   1   2     7     4
 1   5   8   9
               1
                   3
                     0
                         1
                          -1                          -5   -4   0   8
                                   1    0    -1
               1     0    -1
 2   7   2   5     1     3
                               *   1    0    -1   =
 0   1   3   1     7     8
 4   2   1   6     2     8         1    0    -1
 2   4   5   2     3     9             3x3
                                                            4x4
         6x6
Vertical edge detection
 3     0     1     2   7   4
 1
   1
       5
         0
             8
              -1
                   9   3   1                          -5    -4   0   8
                                   1    0    -1
 2
   1
       7
         0
             2
              -1
                   5   1   3                          -10
   1     0    -1               *   1    0    -1   =
 0     1     3     1   7   8
 4     2     1     6   2   8       1    0    -1
 2     4     5     2   3   9           3x3
                                                             4x4
             6x6
Vertical edge detection
 3   0     1     2     7   4
 1   5
       1
           8
             0
                 9
                  -1
                       3   1                          -5   -4   0   8
                                   1    0    -1
 2   7
       1
           2
             0
                 5
                  -1
                       1   3                          -10 -2
       1     0    -1           *   1    0    -1   =
 0   1     3     1     7   8
 4   2     1     6     2   8       1    0    -1
 2   4     5     2     3   9           3x3
                                                            4x4
           6x6
Vertical edge detection
 3   0   1     2     7     4
 1   5   8
           1
               9
                 0
                     3
                      -1
                           1                          -5   -4   0   8
                                   1    0    -1
 2   7   2
           1
               5
                 0
                     1
                      -1
                           3                          -10 -2    2
           1     0    -1       *   1    0    -1   =
 0   1   3     1     7     8
 4   2   1     6     2     8       1    0    -1
 2   4   5     2     3     9           3x3
                                                            4x4
         6x6
Vertical edge detection
 3   0   1   2     7     4
 1   5   8   9
               1
                   3
                     0
                         1
                          -1                          -5   -4   0   8
                                   1    0    -1
 2   7   2   5
               1
                   1
                     0
                         3
                          -1
                                                      -10 -2    2   3
               1     0    -1   *   1    0    -1   =
 0   1   3   1     7     8
 4   2   1   6     2     8         1    0    -1
 2   4   5   2     3     9             3x3
                                                            4x4
         6x6
Vertical edge detection
 3     0     1     2   7   4
 1     5     8     9   3   1                          -5   -4   0   8
                                   1    0    -1
 2
   1
       7
         0
             2
              -1
                   5   1   3                          -10 -2    2   3
   1     0    -1               *   1    0    -1   =
 0     1     3     1   7   8                          0
 4
   1
       2
         0
             1
              -1
                   6   2   8       1    0    -1
 2     4     5     2   3   9           3x3
                                                            4x4
             6x6
Vertical edge detection
 3   0     1     2     7   4
 1   5     8     9     3   1                          -5   -4   0   8
                                   1    0    -1
 2   7
       1
           2
             0
                 5
                  -1
                       1   3                          -10 -2    2   3
       1     0    -1           *   1    0    -1   =
 0   1     3     1     7   8                          0    -2
 4   2
       1
           1
             0
                 6
                  -1
                       2   8       1    0    -1
 2   4     5     2     3   9           3x3
                                                            4x4
           6x6
Vertical edge detection
 3   0   1     2     7     4
 1   5   8     9     3     1                          -5   -4   0    8
                                   1    0    -1
 2   7   2
           1
               5
                 0
                     1
                      -1
                           3                          -10 -2    2    3
           1     0    -1       *   1    0    -1   =
 0   1   3     1     7     8                          0    -2   -4
 4   2   1
           1
               6
                 0
                     2
                      -1
                           8       1    0    -1
 2   4   5     2     3     9           3x3
                                                            4x4
         6x6
Vertical edge detection
 3   0   1   2     7     4
 1   5   8   9     3     1                            -5   -4   0    8
                                   1    0    -1
 2   7   2   5
               1
                   1
                     0
                         3
                          -1
                                                      -10 -2    2    3
               1     0    -1   *   1    0    -1   =
 0   1   3   1     7     8                            0    -2   -4   -7
 4   2   1   6
               1
                   2
                     0
                         8
                          -1       1    0    -1
 2   4   5   2     3     9             3x3
                                                            4x4
         6x6
Vertical edge detection
 3     0     1     2   7   4
 1     5     8     9   3   1                          -5   -4   0    8
                                   1    0    -1
 2     7     2     5   1   3                          -10 -2    2    3
   1     0    -1               *   1    0    -1   =
 0     1     3     1   7   8                          0    -2   -4   -7
 4
   1
       2
         0
             1
              -1
                   6   2   8       1    0    -1
                                                      -3
                                       3x3
   1     0    -1
 2     4     5     2   3   9
                                                            4x4
             6x6
Vertical edge detection
 3   0     1     2     7   4
 1   5     8     9     3   1                          -5   -4   0    8
                                   1    0    -1
 2   7     2     5     1   3                          -10 -2    2    3
       1     0    -1           *   1    0    -1   =
 0   1     3     1     7   8                          0    -2   -4   -7
 4   2
       1
           1
             0
                 6
                  -1
                       2   8       1    0    -1
                                                      -3   -2
                                       3x3
       1     0    -1
 2   4     5     2     3   9
                                                            4x4
           6x6
Vertical edge detection
 3   0   1     2     7     4
 1   5   8     9     3     1                          -5   -4   0    8
                                   1    0    -1
 2   7   2     5     1     3                          -10 -2    2    3
           1     0    -1       *   1    0    -1   =
 0   1   3     1     7     8                          0    -2   -4   -7
 4   2   1
           1
               6
                 0
                     2
                      -1
                           8       1    0    -1
                                                      -3   -2   -3
                                       3x3
           1     0    -1
 2   4   5     2     3     9
                                                            4x4
         6x6
Vertical edge detection
                                                           Feature Map
 3   0   1   2     7     4
 1   5   8   9     3     1                            -5     -4   0      8
                                   1    0    -1
 2   7   2   5     1     3                            -10 -2      2      3
               1     0    -1   *   1    0    -1   =
 0   1   3   1     7     8                            0      -2   -4     -7
 4   2   1   6
               1
                   2
                     0
                         8
                          -1       1    0    -1
                                                      -3     -2   -3 -16
                                       3x3
               1     0    -1
 2   4   5   2     3     9
                                                              4x4
         6x6
Vertical edge detection
10 10 10   0   0   0                    0   30   30   0
10 10 10   0   0   0       1 0 -1
                           1 0 -1       0   30   30   0
10 10 10   0   0   0                =
                       *                0   30   30   0
10 10 10   0   0   0       1 0 -1
10 10 10   0   0   0                    0   30   30   0
10 10 10   0   0   0
Vertical edge detection
 10   10   10   0    0    0
 10   10   10   0    0    0                       0 30 30 0
                                   1   0 -1
 10   10   10   0    0    0                       0 30 30 0
 10   10   10   0    0    0    *   1
                                   1
                                       0 -1
                                       0 -1
                                              =   0 30 30 0
 10   10   10   0    0    0                       0 30 30 0
 10   10   10   0    0    0
 0    0    0    10   10   10
 0    0    0    10   10   10                      0 -30 -30 0
                                   1   0 -1
 0    0    0    10   10   10                      0 -30 -30 0
 0    0    0    10   10   10   *   1
                                   1
                                       0 -1
                                       0 -1
                                              =   0 -30 -30 0
 0    0    0    10   10   10                      0 -30 -30 0
 0    0    0    10   10   10
Horizontal edge detection
     1 0 -1                                 1 1 1
     1 0 -1                                 0 0 0
     1 0 -1                                 -1 -1 -1
     Vertical                               Horizontal
 10 10 10    0   0   0
                                                         0    0   0   0
 10 10 10    0   0   0       1    1    1
 10 10 10
 0   0   0
             0   0
             10 10 10
                     0
                         *   0    0    0      =
                                                         30
                                                         30
                                                              10 -10 -30
                                                              10 -10 -30
                             -1   -1   -1
 0   0   0   10 10 10                                    0    0   0   0
 0   0   0   10 10 10
Learning to detect edges
    1   0 -1                1    0 -1           3    0 -3
    1   0 -1                2    0 -2          10 0 -10
    1   0 -1                1    0 -1           3    0 -3
                            Sobel filter       Scharr filter
3   0   1   2   7   4
1   5   8   9   3   1
                            w1 w2 w3
2   7   2   5   1   3
0   1   3   1   7   8   *   w4 w5 w6       =
                            w7 w8 w9
4   2   1   6   2   8
2   4   5   2   3   9
Why convolutions
                             …
0   1   0       0   1   0        0
0   0   1       1   0   0    8 1
1   0   0       0   1   0    9 0
0   1   0       0   1   0   10: 0
                             …
0   0   1       0   1   0
                            13 0
    6 x 6 image
                            14 0
fewer parameters!           15 1     Only connect to 9
                            16 1     inputs, not fully
                                     connected
                             …
1 -1 -1                           1: 1
-1 1 -1               Filter 1    2: 0
-1 -1         1                   3: 0
                                  4: 0            3
1     0   0       0    0   1
                                  …
0     1   0       0    1   0      7: 0
0     0   1       1    0   0      8: 1
1     0   0       0    1   0      9: 0           -1
0     1   0       0    1   0     10: 0
                                  …
0     0   1       0    1   0
                                 13: 0
      6 x 6 image
                                 14: 0
Fewer parameters                 15: 1
                                 16: 1   Shared weights
    Even fewer parameters
                                  …
Padding
3   0   1     2   7    4
1   5   8     9   3    1
2   7   2     5   1    3         *                       =
0   1   3     1   7    8
4   2   1     6   2    8                     3x3                   4x4
2   4   5     2   3    9                     fxf                   kxk
        6x6
                      • Reduction in spatial dimension       k=n–f +1
        nxn                                                   =6–3+1
                      • Loss of information at image
                                                              =4
                        boundary region
Padding
3   0   1     2   7    4
1   5   8     9   3    1
2   7   2     5   1    3         *                       =
0   1   3     1   7    8
4   2   1     6   2    8                     3x3                   4x4
2   4   5     2   3    9                     fxf                   kxk
        6x6
                      • Reduction in spatial dimension       k=n–f +1
        nxn                                                   =6–3+1
                      • Loss of information at image
                                                              =4
                        boundary region
Padding
3   0   1     2   7    4
1   5   8     9   3    1
2   7   2     5   1    3         *                       =
0   1   3     1   7    8
4   2   1     6   2    8                     3x3                   4x4
2   4   5     2   3    9                     fxf                   kxk
        6x6
                      • Reduction in spatial dimension       k=n–f +1
        nxn                                                   =6–3+1
                      • Loss of information at image
                                                              =4
                        boundary region
Padding
3   0   1     2   7    4
1   5   8     9   3    1
2   7   2     5   1    3         *                       =
0   1   3     1   7    8
4   2   1     6   2    8                     3x3                   4x4
2   4   5     2   3    9                     fxf                   kxk
        6x6
                      • Reduction in spatial dimension       k=n–f +1
        nxn                                                   =6–3+1
                      • Loss of information at image
                                                              =4
                        boundary region
Padding
3   0   1     2   7    4
1   5   8     9   3    1
2   7   2     5   1    3         *                       =
0   1   3     1   7    8
4   2   1     6   2    8                     3x3                   4x4
2   4   5     2   3    9                     fxf                   kxk
        6x6
                      • Reduction in spatial dimension       k=n–f +1
        nxn                                                   =6–3+1
                      • Loss of information at image
                                                              =4
                        boundary region
Padding
3   0   1     2   7    4
1   5   8     9   3    1
2   7   2     5   1    3         *                       =
0   1   3     1   7    8
4   2   1     6   2    8                     3x3                   4x4
2   4   5     2   3    9                     fxf                   kxk
        6x6
                      • Reduction in spatial dimension       k=n–f +1
        nxn                                                   =6–3+1
                      • Loss of information at image
                                                              =4
                        boundary region
    Padding
0   0   0    0    0   0   0   0
0   3   0    1    2   7   4   0
0   1   5    8    9   3   1   0
0   2   7    2    5   1   3   0   *         =
0   0   1    3    1   7   8   0
0   4   2    1    6   2   8   0       3x3                 4x4
0   2   4    5    2   3   9   0       fxf                 kxk
0   0   0    0    0   0   0   0
            6x6                                 k = n + 2p – f + 1
            nxn                                   = 6 + 2x1 – 3 + 1
                                                  =6
            Padding = p = 1
Valid and Same Convolution
Same convolution: Pad the image so that the output size is same as the
input size.
         n + 2p – f + 1 = n
         à 2p = f – 1
         à p = (f - 1)/2
Therefore, if f = 3, then p = (3 – 1)/2 = 1   if f = 5, then p = (5 – 1)/2 = 2
Valid and Same Convolution
Same convolution: Pad the image so that the output size is same as the
input size.
         n + 2p – f + 1 = n          Filter size is usually odd
         à 2p = f – 1
         à p = (f - 1)/2
                                     • Central pixel as reference
                                      • Symmetry around center
                                      • Avoid asymmetric padding
Strided Convolution
2   3   7   4   6   2   9
6   6   9   8   7   4   3
3   4   8   3   8   9   7       3     4    4
7   8   3   6   6   3   4   *   1     0    2   =
4   2   1   8   3   4   6       -1    0    3
3   2   4   1   9   8   3
                                     3x3           3x3
0   1   3   9   2   1   4
            7x7
Strided Convolution
23 34 74 4       6   2   9
61 60 92 8       7   4   3
3 -1 4 0 8 3 3   8   9   7       3     4    4       91
7   8   3   6    6   3   4   *   1     0    2   =
4   2   1   8    3   4   6       -1    0    3
3   2   4   1    9   8   3
                                      3x3            3x3
0   1   3   9    2   1   4
            7x7
Strided Convolution
2   3   73 44 64 2       9
6   6   91 80 72 4       3
3   4   8 -1 3 0 8 3 9   7       3     4    4       91 100
7   8   3   6   6   3    4   *   1     0    2   =
4   2   1   8   3   4    6       -1    0    3
3   2   4   1   9   8    3
                                      3x3            3x3
0   1   3   9   2   1    4
            7x7
Strided Convolution
2   3   7   4   63 24 94
6   6   9   8   71 40 32
3   4   8   3   8 -1 9 0 7 3       3     4    4       91 100 83
7   8   3   6   6   3    4     *   1     0    2   =
4   2   1   8   3   4    6         -1    0    3
3   2   4   1   9   8    3
                                        3x3            3x3
0   1   3   9   2   1    4
            7x7
Strided Convolution
2   3   7   4    6   2   9
6   6   9   8    7   4   3
33 44 84 3       8   9   7       3     4    4       91 100 83
71 80 32 6       6   3   4   *   1     0    2   =   69
4 -1 2 0 1 3 8   3   4   6       -1    0    3
3   2   4   1    9   8   3
                                      3x3            3x3
0   1   3   9    2   1   4
            7x7
Strided Convolution
2   3   7   4   6   2    9
6   6   9   8   7   4    3
3   4   83 34 84 9       7       3     4    4       91 100 83
7   8   31 60 62 3       4   *   1     0    2   =   69 91
4   2   1 -1 8 0 3 3 4   6       -1    0    3
3   2   4   1   9   8    3
                                      3x3            3x3
0   1   3   9   2   1    4
            7x7
Strided Convolution
2   3   7   4   6   2    9
6   6   9   8   7   4    3
3   4   8   3   83 94 74           3     4    4       91 100 83
7   8   3   6   61 30 42       *   1     0    2   =   69 91 127
4   2   1   8   3 -1 4 0 6 3       -1    0    3
3   2   4   1   9   8    3
                                        3x3            3x3
0   1   3   9   2   1    4
            7x7
Strided Convolution
2        3       7       4   6   2   9
6        6       9       8   7   4   3
3        4       8       3   8   9   7       3     4    4       91 100 83
7        8       3       6   6   3   4   *   1     0    2   =   69 91 127
43 24 14 8                   3   4   6       -1    0    3       44
31 20 42 1                   9   8   3
                                                  3x3            3x3
0   -1
         1   0
                 3   3
                         9   2   1   4
                         7x7
Strided Convolution
2   3   7        4       6       2   9
6   6   9        8       7       4   3
3   4   8        3       8       9   7       3     4    4       91 100 83
7   8   3        6       6       3   4   *   1     0    2   =   69 91 127
4   2   13 84 34 4                   6       -1    0    3       44 72
3   2   41 10 92 8                   3
                                                  3x3            3x3
0   1   3   -1
                 9   0
                         2   3
                                 1   4
                 7x7
Strided Convolution
2   3   7   4   6        2       9
6   6   9   8   7        4       3
3   4   8   3   8        9       7           3     4    4            91 100 83
7   8   3   6   6        3       4       *   1     0    2      =     69 91 127
4   2   1   8   33 44 64                     -1    0    3            44 72 74
3   2   4   1   91 80 32
                                                  3x3                    3x3
0   1   3   9   2   -1
                         1   0
                                 4   3
                                                  fxf                    kxk
            7x7                                             !"#$%&
            nxn                                    k=            +1
                                                            '
                                                   = (7 + 0 - 3)/2 + 1
            Padding = p = 0
                                                   = 2+1 = 3
            Stride = s = 2
General Formulae
•   w is the width of the input image
•   h is the height of the input image
•   pw is the padding applied to the width of the input
•   ph is the padding applied to the height of the input
•   fw is the width of the convolutional filter
•   fh is the height of the convolutional filter
•   sw is the stride used in the horizontal direction
•   sh is the stride used in the vertical direction
                            𝑤 + 2𝑝! − 𝑓!
• Output image width =           𝑠!
                                         +1
                            ℎ + 2𝑝" − 𝑓"
• Output image height =          𝑠"
                                         +1
Volume Convolution
               *             =
                     3x3x3
                                 4x4
  6x6x3
Volume Convolution
               *             =
                     3x3x3
                                 4x4
  6x6x3
Volume Convolution
        *
            3x3x3
6x6x3                    4x4
Volume Convolution
        *
            3x3x3
6x6x3                4x4
Volume Convolution
        *
            3x3x3
6x6x3                4x4
  Multiple Kernels
                  *              =
                      3x3x3              4x4
      6x6x3                                                4x4x2
                  *              =
No. of channels
                      3x3x3                                   No. of kernels
                                         4x4
        n x n x nc * f x f x nc = (n – f + 1) x (n – f + 1) x nf
One Neural Network Layer
                b(1)
                                Z(1) = W(1)a(0) + b(1)
                                a(1) = g(Z(1))
One Convolution Layer                             Z(1) = W(1)a(0) + b(1)
                                                  a(1) = g(Z(1)) Activation
                                                                   function
 a(0)
               *           = ReLU   W1(1)a(0)   +b1(1)
For each kernel there are 3 x 3 x 3 = 27 parameters (weights) + 1 parameter (bias) = 28 parameters
For 10 kernels, there will be 28 x 10 = 280 trainable parameters (Independent of input image size)
Summary of notation
If layer l is a convolution layer:
                                                              (,-.)     (,-.)     (,-.)
  𝑓 (,) =   kernel size                       Input:          𝑛*      ×𝑛0       ×𝑛1
                                                       (,)
                                              Output: 𝑛*     (,) (,)
  𝑝(,) = padding                                           ×𝑛0 ×𝑛1
  𝑠 (,) = stride
                                                          (,-.)
   (,)
  𝑛1 =      number of filters                  (,)       𝑛*       + 2𝑝(,) − 𝑓 (,)
                                              𝑛*     =               (,)
                                                                                  +1
                                                                   𝑠
                                  (,-.)                   (,-.)
Each kernel is: 𝑓 (,) ×𝑓 (,) ×𝑛
                               1               (,)       𝑛0       + 2𝑝(,) − 𝑓 (,)
                                              𝑛0 =                   (,)
                                                                                  +1
                          (,)   (,)     (,)                        𝑠
Activations: 𝑎(,) → 𝑛0 ×𝑛* ×𝑛1
Weights: 𝑓 (,) ×𝑓 (,) ×𝑛1(,-.) ×𝑛1(,)
            (,)
bias: 𝑛1
Pooling layer: Max pooling
    1   3   2   1
    2   9   1   1
    1   3   2   3
    5   6   1   2
        4x4
Pooling layer: Max pooling
    1   3   2   1
    2   9   1   1                  9
    1   3   2   3
    5   6   1   2                   2x2
        4x4
                         Filter size = 2 x 2
Pooling layer: Max pooling
    1   3   2   1
    2   9   1   1                  9      2
    1   3   2   3
    5   6   1   2                   2x2
        4x4
                         Filter size = 2 x 2
                         Stride= 2
Pooling layer: Max pooling
    1   3   2   1
    2   9   1   1                  9      2
    1   3   2   3                  6
    5   6   1   2                   2x2
        4x4
                         Filter size = 2 x 2
                         Stride= 2
Pooling layer: Max pooling
    1   3   2   1
    2   9   1   1                    9      2
    1   3   2   3                    6      3
    5   6   1   2                     2x2
        4x4
                           Filter size = 2 x 2   Two
                           Stride= 2             hyperparameters
                      No parameters to learn
Why Pooling?
 1   3    2    1   3
 2   9    1    1   5
 1   3    2    3   2
 8   3    5    1   0
 5   6    1    2   9                         3x3
                       Filter size = 3 x 3
         5x5
                       Stride= 1
Max Pooling – Another Example
 1   3    2    1   3
 2   9    1    1   5                         9
 1   3    2    3   2
 8   3    5    1   0
 5   6    1    2   9                             3x3
                       Filter size = 3 x 3
         5x5
                       Stride= 1
Max Pooling – Another Example
 1   3    2    1   3
 2   9    1    1   5                         9   9
 1   3    2    3   2
 8   3    5    1   0
 5   6    1    2   9                             3x3
                       Filter size = 3 x 3
         5x5
                       Stride= 1
Max Pooling – Another Example
 1   3    2    1   3
 2   9    1    1   5                         9   9     5
 1   3    2    3   2
 8   3    5    1   0
 5   6    1    2   9                             3x3
                       Filter size = 3 x 3
         5x5
                       Stride= 1
Max Pooling – Another Example
 1   3    2    1   3
 2   9    1    1   5                         9   9     5
 1   3    2    3   2                         9
 8   3    5    1   0
 5   6    1    2   9                             3x3
                       Filter size = 3 x 3
         5x5
                       Stride= 1
Max Pooling – Another Example
 1   3    2    1   3
 2   9    1    1   5                         9   9     5
 1   3    2    3   2                         9   9
 8   3    5    1   0
 5   6    1    2   9                             3x3
                       Filter size = 3 x 3
         5x5
                       Stride= 1
Max Pooling – Another Example
 1   3    2    1   3
 2   9    1    1   5                         9   9     5
 1   3    2    3   2                         9   9     5
 8   3    5    1   0
 5   6    1    2   9                             3x3
                       Filter size = 3 x 3
         5x5
                       Stride= 1
Max Pooling – Another Example
 1   3    2    1   3
 2   9    1    1   5                         9   9     5
 1   3    2    3   2                         9   9     5
 8   3    5    1   0                         8
 5   6    1    2   9                             3x3
                       Filter size = 3 x 3
         5x5
                       Stride= 1
Max Pooling – Another Example
 1   3    2    1   3
 2   9    1    1   5                         9   9     5
 1   3    2    3   2                         9   9     5
 8   3    5    1   0                         8   6
 5   6    1    2   9                             3x3
                       Filter size = 3 x 3
         5x5
                       Stride= 1
Max Pooling – Another Example
 1   3    2    1   3
 2   9    1    1   5                         9   9     5
 1   3    2    3   2                         9   9     5
 8   3    5    1   0                         8   6     9
 5   6    1    2   9                             3x3
                       Filter size = 3 x 3
         5x5
                       Stride= 1
Max Pooling – Another Example
 1   3     2     1    3
 2   9     1     1    5                         9      9    5
 1   3     2     3    2                         9      9    5
 8   3     5     1    0                         8      6    9
 5   6     1     2    9                             3 x 3 x nc
                          Filter size = 3 x 3
         5 x 5 x nc
                          Stride= 1
Average Pooling
 1   3     2     1    3
 2   9     1     1    5                         2.67     2.78   2.22
 5   6     1     2    9                                3 x 3 x nc
                          Filter size = 3 x 3
         5 x 5 x nc
                          Stride= 1
Summary of pooling
   Hyperparameters :
        f : filter size
        s : stride
        Max or average pooling
                                   𝑛* − 𝑓      𝑛0 − 𝑓
                𝑛* × 𝑛0 × 𝑛1   →          +1 ×        + 1 × 𝑛1
                                     𝑠            𝑠
f = 2, s = 2 f = 3, s = 1
 4 x 4 x 10 à 2 x 2 x 10                   5 x 5 x 10 à 3 x 3 x 10
Flattening and Fully Connected Layers
Types of layer in a convolutional network:
  - Convolution
  - Pooling
  - Fully connected
      Complete convolutional network
                                                           7 x 7 x 64
28 x 28 x 3                  14 x 14 x 32
                                            14 x 14 x 64
              28 x 28 x 32
                                                                          Fully       Fully
                                                                        Connected   Connected
Complete convolutional network parameters
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), padding='same', strides=(1, 1), activation='relu', input_shape=(28, 28, 3)))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(10, activation='softmax'))
𝑦.
                  $
          1
  Cost J = , 𝐿(𝑦. (%) , 𝑦 (%) )
          𝑚
                 %&"
• LeNet-5
• AlexNet
• VGG
• ResNet
• Inception Net
          LeNet-5
                                                                                             FC        FC
     •    138M parameters
     •    Relu non-linearity
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv
preprint arXiv:1409.1556.
Transfer Learning
Suppose you want to build a CNN to classify the following dog breeds
Start with a pre-trained open source model which is trained on image data and
download its code and weights.
Transfer Learning (Little training data)
                              Train             G
                                                    L
Softmax 1000
                Freeze
Transfer Learning (Some more training data)
                             Train             G
                                                   L
Softmax 1000
                                                           G
          Freeze
                                                       L
                                                   O
                           Train
  Transfer Learning (Lot of training data)
                                                                      G
                              Train
                                                                          L
Softmax 1000
Re-train the complete network with pre-trained weights as the initial weights
Data Augmentation
(0,0)
Softmax
     pc   Is there a car in
          the image? (0/1)          1                     0
     bx                             0.5                   ?
                                                                 If 𝑦 = 1
     by    Bounding box             0.4                   ?        7 𝑦) = (𝑦7# − 𝑦# )$ +(𝑦7$ − 𝑦$ )$
                                                                 L(𝑦,
     bw    (real numbers)                                                 + ⋯ + (𝑦7% − 𝑦% )$
y=                                  0.3                   ?
     bh                        y=   0.25             y=   ?
     c1                             0                     ?      If 𝑦 = 0
           Class, one of
     c2    them is 1, others
                                    1                     ?
                                                                   7 𝑦) = (𝑦7# − 𝑦# )$
                                                                 L(𝑦,
     c3    are 0                    0                     ?