My undergraduate final year project, supervised by Dr. Qiufeng Wang, which intends to implement object detection in images using SSD method. This README.md is a record of learning outcome and experiment observation, contains the paper, implementation code and relevant data.
The detail theory refers to the SSD paper. Initially, I read and understand the methodology in paper with a keras version implementation. Considering about current stage that I still a tyro in this field, it is most likely to make modifications on such a version based on user-friendly interface. For better performance, it may be better to refer a tensorflow version with more advanced features. With the study goes further, more methodologies and techniques come into view.
This part essentially record the cues of my study on this project. There will be some links to access papers, technique blogs referred through out my study. As well, some notes written to record my understanding and learning outcomes on specific topics that relate to this topic will be provided.
Most methodologies used for object detection has a network to extract features of different level and then recognition. Such a recognition can be regarded as a sub-task for object detection since we need to know the existence of objects before we locate them (or in other word label them with bounding boxes). Convolutional Neural Networks (CNN) has proved their conspicuous performance in recognition tasks on images. In current stage of my study, I make some notes on following network architectures:
This project employs Bounding Boxes to show the results of object detection rather than segmentation which is a much more advanced task.
- Anchor
- Region of Interest (RoI)
- Region Proposal Network (RPN)
Even though, as previously mentioned, this project mainly study and implement SSD for object detection. It is improper to ignore the methodologies which perform well in this task. Following list collects some remarkable methodologies. ()
-
SSD / SSDLite (mentioned in MobileNet V2 paper)
-
YOLO / YOLOv2 / YOLOv3*
-
Faster R-CNN
-
Mask R-CNN
-
R-FCN (Fully-Convolutional Network)
-
SSPNet (Spatial pyramid pooling)
-
Light-head RCNN*
-
RefineNet*
Based on recently published paper Focal Loss for Dense Object Detection. The training strategy should be considered if we need a better performance.
For loss function, the study of some typical loss in object detection needs study
- L1 loss
- L2 loss
- softmax
- focal loss
2017/9/27
- Add more testing images. The accuracy is, although not bad, lower than expected. Particularly, some objects are ignored as their low confidence. There is, as well, some wrong classification (eg. treat horse as cow).
There was no dropout for current trained network. Thus, it is conjectured as the reason for not sufficiently good result.After further study, I think the dropout regularization exists when training the network.
2017/10/7
- Study of SSD architecture: Based Network
- Learning TensorFlow frame work
2017/10/17
- Review of CNN to help understanding
- Review some basic of Deep Learning
2017/10/31
- Learning of Keras interface
- Code reading and understanding about paper
2017/11/8
- Study of training methods and loss function of SSD
- Question: Confusions on number of bbox selection
2017/11/15
- Study of Deep Residual Learning which is a very deep architecture but easy to train.
- This is study to explore the possibility of change the recognition network (or base network) for a higher confidence score or better performance on small objects.
2017/11/21
- More details on training methodology.
- Question: The classifier for each feature map is just a convolutional layer and then flattened layer without fully-connection. So, is it not necessary for a classifier to have FC? The function of FC
- About different scale feature map: It is initially confusing to make the b-boxes of different feature layers correctly plotted on the images. But, through the reading of code, I observed that the shape of those layers are possibly the same. Pooling layers???
2017/11/26
- Question: Regression detail of bounding box position
2017/12/2
- Question: Batch Normalization Layer (which occurs while reading other paper)
2017/12/6
- Poster presentation session
- Summarize works so far
- Consider more on possible modification...