9 CNN-1
9 CNN-1
Convolutional
                  Neural Networks
I2DL: Prof. Dai                     1
           Fully Connected Neural Network
         Width
                        Depth
I2DL: Prof. Dai                             2
          Problems using FC Layers on Images
    • How to process a tiny image with FC layers
5 weights
                      5
                  3         3 neuron layer
                                        25 weights
                                             For the whole 5 × 5
                                             image on 1
          5                                  channel
                      5
                  3         3 neuron layer
                                        75 weights
                                             For the whole 5 × 5
                                             image on the 3
          5                                  channel
                      5
                  3         3 neuron layer
                                        75 weights
                                                   For the whole
                                                   5 × 5 image on
                                        75 weights the three
          5                                        channels per
                                                   neuron
                                        75 weights
                      5
                  3         3 neuron layer
1000
                      1000
                  3          3 neuron layer
                      1000
                  3          1000 neuron layer
   [Li et al., CS231n Course Slides] Lecture 12: Detection and Segmentation
I2DL: Prof. Dai                                                               11
                  Convolutions
                                 𝑓 ∗ 𝑔 = න 𝑓 𝜏 𝑔 𝑡 − 𝜏 𝑑𝜏
                                        −∞
                                         𝑓 = red
                                        𝑔 = blue
                                      𝑓 ∗ 𝑔 = green
𝑓∗𝑔 3
                                         1    1   1
                                    4⋅     +3⋅ +2⋅ = 3
                                         3    3   3
𝑓∗𝑔 3 0
                                       1      1        1
                                  3⋅     + 2 ⋅ + (−5) ⋅ = 0
                                       3      3        3
𝑓∗𝑔 3 0 0
                                     1         1     1
                                2⋅     + (−5) ⋅ + 3 ⋅ = 0
                                     3         3     3
𝑓∗𝑔 3 0 0 1
                                     1    1   1
                              −5 ⋅     +3⋅ +5⋅ =1
                                     3    3   3
𝑓∗𝑔 3 0 0 1 10/3
                                    1    1   1 10
                               3⋅     +5⋅ +2⋅ =
                                    3    3   3  3
𝑓∗𝑔 3 0 0 1 10/3 4
                                    1    1   1
                               5⋅     +2⋅ +5⋅ = 4
                                    3    3   3
𝑓∗𝑔 3 0 0 1 10/3 4 4
                                    1    1   1
                               2⋅     +5⋅ +5⋅ = 4
                                    3    3   3
                                    1    1   1 16
                               5⋅     +5⋅ +6⋅ =
                                    3    3   3  3
?? 3 0 0 1 10/3 4 4 16/3 ??
What to do at boundaries?
?? 3 0 0 1 10/3 4 4 16/3 ??
                                  What to do at boundaries?
                                      Option 1: Shrink
                        3     0      0     1    10/3     4    4   16/3
I2DL: Prof. Dai                                                               24
                        What are Convolutions?
                              Discrete case: box filter
       0          4     3     2      -5     3      5     2    5    5      6     0
?? 3 0 0 1 10/3 4 4 16/3 ??
  1   1   1 7                     What to do at boundaries?
0⋅ +4⋅ +3⋅ =
  3   3   3 3                     Option 2: Pad (often 0’s)
               4               3     2    1   -3
               1               0     3    3   5
               -2              0     1    4   4
                                                            Output 3 × 3
                                                                           6
               5               6     7    9   -1
               Kernel 3 × 3
                              0     -1   0
                              -1    5    -1        5 ⋅ 3 + −1 ⋅ 3 + −1 ⋅ 2 + −1 ⋅ 0 + −1 ⋅ 4
                              0     -1   0         = 15 − 9 = 6
               4               3     2    1   -3
               1               0     3    3   5
               -2              0     1    4   4
                                                            Output 3 × 3
                                                                           6   1
               5               6     7    9   -1
               Kernel 3 × 3
                              0     -1   0
                              -1    5    -1        5 ⋅ 2 + −1 ⋅ 2 + −1 ⋅ 1 + −1 ⋅ 3 + −1 ⋅ 3
                              0     -1   0         = 10 − 9 = 1
               4               3     2    1   -3
               1               0     3    3   5
               -2              0     1    4   4
                                                           Output 3 × 3
                                                                          6   1   8
               5               6     7    9   -1
               Kernel 3 × 3
                              0     -1   0
                              -1    5    -1        5 ⋅ 1 + −1 ⋅ −5 + −1 ⋅ −3 + −1 ⋅ 3
                              0     -1   0         + −1 ⋅ 2
                                                   =5+3= 8
               4               3     2    1   -3
               1               0     3    3   5
               -2              0     1    4   4
                                                            Output 3 × 3
                                                                           6    1   8
               5               6     7    9   -1
                                                                           -7
               Kernel 3 × 3
                              0     -1   0
                              -1    5    -1        5 ⋅ 0 + −1 ⋅ 3 + −1 ⋅ 0 + −1 ⋅ 1 + −1 ⋅ 3
                              0     -1   0         = 0 − 7 = −7
               4               3     2    1   -3
               1               0     3    3   5
               -2              0     1    4   4
                                                             Output 3 × 3
                                                                            6    1   8
               5               6     7    9   -1
                                                                            -7   9
               Kernel 3 × 3
                              0     -1   0
                              -1    5    -1        5 ⋅ 3 + −1 ⋅ 2 + −1 ⋅ 3 + −1 ⋅ 1 + −1 ⋅ 0
                              0     -1   0         = 15 − 6 = 9
               4               3     2    1   -3
               1               0     3    3   5
               -2              0     1    4   4
                                                             Output 3 × 3
                                                                            6    1   8
               5               6     7    9   -1
                                                                            -7   9   2
               Kernel 3 × 3
                              0     -1   0
                              -1    5    -1        5 ⋅ 3 + −1 ⋅ 1 + −1 ⋅ 5 + −1 ⋅ 4 + −1 ⋅ 3
                              0     -1   0         = 15 − 13 = 2
               4               3     2    1   -3
               1               0     3    3   5
               -2              0     1    4   4
                                                             Output 3 × 3
                                                                            6    1   8
               5               6     7    9   -1
                                                                            -7   9   2
               Kernel 3 × 3
                              0     -1   0                                  -5
                              -1    5    -1       5 ⋅ 0 + −1 ⋅ 0 + −1 ⋅ 1 + −1 ⋅ 6
                              0     -1   0        + −1 ⋅ −2
                                                  = −5
               4               3     2    1   -3
               1               0     3    3   5
               -2              0     1    4   4
                                                          Output 3 × 3
                                                                         6    1    8
               5               6     7    9   -1
                                                                         -7   9    2
               Kernel 3 × 3
                              0     -1   0                               -5   -9
                              -1    5    -1   5 ⋅ 1 + −1 ⋅ 3 + −1 ⋅ 4 + −1 ⋅ 7 + −1 ⋅ 0
                              0     -1   0    = 5 − 14 = −9
               4               3     2    1   -3
               1               0     3    3   5
               -2              0     1    4   4
                                                          Output 3 × 3
                                                                         6    1    8
               5               6     7    9   -1
                                                                         -7   9    2
               Kernel 3 × 3
                              0     -1   0                               -5   -9   3
                              -1    5    -1   5 ⋅ 4 + −1 ⋅ 3 + −1 ⋅ 4 + −1 ⋅ 9 + −1 ⋅ 1
                              0     -1   0    = 20 − 17 = 3
filter 5 × 5 × 3
                      32
                                Images have depth: e.g. RGB -> 3 channels
            3
I2DL: Prof. Dai                                                                       36
                  Convolutions on RGB Images
32 × 32 × 3 image (pixels 𝑿)
                                            1 number at a time:
                                                equal to dot product between
                              32                filter weights 𝒘 and 𝒙𝒊 − 𝑡ℎ chunk of
                          5                     the image. Here: 5 ⋅ 5 ⋅ 3 = 75-dim
                                    𝑧𝑖          dot product + bias
                      5
                  3                                       𝑧𝑖 = 𝒘𝑇 𝒙𝑖 + 𝑏
                      32
                                          5×5×3 ×1         (5 × 5 × 3) × 1   1
            3
I2DL: Prof. Dai                                                                    37
                   Convolutions on RGB Images
                                                                 Activation map
32 × 32 × 3 image                                                (also feature map)
                                    5 × 5 × 3 filter
                                                                             28
                               32
                           5                    Convolve
                       5
                   3            Slide over all spatial locations 𝑥𝑖
                                and compute all output 𝑧𝑖 ;             28
                       32       w/o padding, there are              1
                                28 × 28 locations
             3
 I2DL: Prof. Dai                                                                      38
                  Convolution Layer
                                    5 × 5 × 3 filter
                                                                            28
                               32
                           5                    Convolve
                       5
                   3
                                Let’s apply a different filter         28
                       32       with different weights!          11
             3
 I2DL: Prof. Dai                                                                   40
                         Convolution Layer
                                    Convolution “Layer”
32 × 32 × 3 image                                        Activation maps
                        32                                             28
                                     Convolve
                                        Filter height of 𝑭
                                                              Stride:            𝑆
Input height of 𝑵
                                                              Output:     𝑁−𝐹
                                                                              +1 ×
                                                                                    𝑁−𝐹
                                                                                        +1
                                                                            𝑆               𝑆
                    Filter width of 𝑭
                                                                                    7−3
                                                             𝑁 = 7, 𝐹 = 3, 𝑆 = 1:       +   1=5
                                                                                     1
                                                                                    7−3
                                                             𝑁 = 7, 𝐹 = 3, 𝑆 = 2:       +   1=3
                                                                                     2
                                                                                    7−3
                                                             𝑁 = 7, 𝐹 = 3, 𝑆 = 3:       +   1 = 2. 3ത
                                                                                     3
                                                                    Fractions are illegal
I2DL: Prof. Dai                                                                                     57
                 Convolution Layers: Dimensions
    Input Image
                             0                               0
                             0                               0
                                                                 • Sizes get small too quickly
                             0                               0
                                                                 • Corner pixel is only used
                             0                               0
                                                                   once
                             0                               0
                             0                               0
                             0                               0
                             0   0   0   0   0   0   0   0   0
                             0                               0    Padding (𝑃):        1
                             0                               0    Stride (𝑆):         1
                             0                               0    Output             7×7
                             0                               0    Most common is ‘zero’ padding
                             0                               0
                                                                  Output Size:
                             0                               0
                             0                               0       𝑁+2⋅𝑃−𝐹             𝑁+2⋅𝑃−𝐹
                                                                               +1 ×                +1
                                                                        𝑆                   𝑆
                             0   0   0   0   0   0   0   0   0
                                                                     denotes the floor operator (as in
                                                                 practice an integer division is performed)
I2DL: Prof. Dai                                                                                        61
                                     Convolution Layers: Padding
                             0   0   0   0   0   0   0   0   0   Types of convolutions:
Image 7 × 7 + zero padding
                             0                               0
                             0                               0
                                                                 • Valid convolution: using no
                             0                               0
                                                                   padding
                             0                               0
                             0                               0
                             0                               0
                                                                 • Same convolution:
                                                                   output=input size
                             0                               0
                                                                                         𝐹−1
                             0   0   0   0   0   0   0   0   0      Set padding to 𝑃 =
                                                                                          2
             A1: (3, 4, 5, 5)
             A2: (4, 5, 5)
             A3: depends on the width and height of the image
              3   1     3     5                                      ‘Pooled’ output
                                             Max pool with
              6   0     7     9         2 × 2 filters and stride 2      6     9
              3   2     1     4                                         3     4
              0   2     4     3
              3   1     3     5                                     ‘Pooled’ output
                                         Average pool with
              6   0     7     9        2 × 2 filters and stride 2     2.5    6
3 2 1 4 1.75 3
0 2 4 3
                                             =
                                3x3 filter       3x3 output
5x5 input
                                               =
                                3x3 filter                3x3 output
• http://cs231n.github.io/convolutional-networks/