A 2017 Guide to Semantic Segmentation with Deep Learning
-
- Fully Convolutional Networks for Semantic Segmentation
- Submitted on 14 Nov 2014
- Arxiv Link
Key Contributions:
- Popularize the use of end to end convolutional networks for semantic segmentation
- Re-purpose imagenet pretrained networks for segmentation
- Upsample using deconvolutional layers
- Introduce skip connections to improve over the coarseness of upsampling
Explanation:
Key observation is that fully connected layers in classification networks can be viewed as convolutions with kernels that cover their entire input regions. This is equivalent to evaluating the original classification network on overlapping input patches but is much more efficient because computation is shared over the overlapping regions of patches. Although this observation is not unique to this paper (see overfeat, this post), it improved the state of the art on VOC2012 significantly.
Fully connected layers as a convolution. Source.
After convolutionalizing fully connected layers in a imagenet pretrained network like VGG, feature maps still need to be upsampled because of pooling operations in CNNs. Instead of using simple bilinear interpolation, deconvolutional layers can learn the interpolation. This layer is also known as upconvolution, full convolution, transposed convolution or fractionally-strided convolution.
However, upsampling (even with deconvolutional layers) produces coarse segmentation maps because of loss of information during pooling. Therefore, shortcut/skip connections are introduced from higher resolution feature maps.
Benchmarks (VOC2012):
Score Comment Source 62.2 - leaderboard 67.2 More momentum. Not described in paper leaderboard My Comments:
- This was an important contribution but state of the art has improved a lot by now though.