My personal experiments on CNN behind "Multiview Detection with Feature Perspective Transformation" [Website] [arXiv]
@inproceedings{hou2020multiview,
title={Multiview Detection with Feature Perspective Transformation},
author={Hou, Yunzhong and Zheng, Liang and Gould, Stephen},
booktitle={ECCV},
year={2020}
}
This repo is for my practice and trials in learning Convolutional Neural Networks. The changes carried out are probably simple ones. The primary goal here is to get familiar with a code base that's a good implementation of a Neural Network architecture.
Here I'm focusing on MVDet model, which has been described and analyzed in the paper linked at the top. I got my understandings by reading through this paper (still learning from it) and also by going through the code that the paper's authors provided (this repo is a fork of theirs). With the intention of learning, I'm trying out different things here. Hoping to document enough of the learnings! We'll see!
The original architecture of MVDet is given below.
Their code implementation (and also my modified ones) of MVDet uses CUDA as well as the following libraries
- python 3.7+
- pytorch 1.4+ & tochvision
- numpy
- matplotlib
- pillow
- opencv-python
- kornia
- matlab & matlabengine (required for evaluation) (see this link for detailed guide)
This experiment is only to get some basic familiarity with the code logics.
- Importantly, MVDet uses a Resnet18 architecture as its model-core. I've modified the code to use Resnet34 instead.
- Some setup related changes have been carried out in order to run the code in my Windows desktop with one GPU instead of 2 GPUs as run by the paper's authors.
- Also, minor changes are done to the main script where the parameters are explicitly hard-coded (this is for my convenience).
- "persp_trans_detector.py" script has a "PerspTransDetector" class. In its constructor (
__init__method), Resnet34 is added as an additional option. - In the same script, the
__init__method and theforwardmethod, both establish the device in which the model stays and the data gets loaded onto.- The original implementation splits the model between two GPUs. I remove that portion and instead put the whole model into the single available GPU (in
__init__method). - Then, the data before it gets put through the model (in the
forwardmethod), it is loaded into GPU. Since the original model is split, this happens multiple times for the dataset. At those instances, they are just loaded into the same GPU.
- The original implementation splits the model between two GPUs. I remove that portion and instead put the whole model into the single available GPU (in
- In the "main.py" script, the argparse library is used to apply the parameters. I've modified to use a simple class with properties instead. This is just for my convenience.
This one is a work in progress. Will be committing the changes soon. I've tried to modify the architecture to add an additional model in between the single-view results and the multi-view aggregation layers. These modifications will primarily be in "res_proj_variant". Hoping to complete them soon and commit them here. Fingers crossed!!
- The existing architecture has multiple variants. One such variant passes the result of the single-view detection, on to the next steps instead of the feature mappings. Refer image below where the red-line highlights the alternative path in this variant. I've modified the CNN model (CNN block highlighted in red as well) used in this alternate pathway here. The idea was to understand what it takes to integrate other single-view CNN models into this architecture.
- The model definition in the
__init__method of theResProjVariantclass in the "res_proj_variant.py" script was modified. Theself.image_classifierproperty was modified to include an additional resnet18 CNN block in its middle. - In order to integrate this ResNet18 model inside this block, the kernel size of the first CNN layer was converted to
kernel_size=1, so that the channel size can be suitably adjusted for the input of ResNet18 block. - The last few layers of the ResNet18 block added was omitted, in order to avoid flattening the layers. This way it will remain suitable for the next steps. Ideally, some of the first few layers should also be omitted so that the channel size need not be drastically reduced in order to input here (reduced 512 channels to 3 channels). May be in one of the future experiments.
This one is a work in progress. Will be committing the changes soon. I'll try to make changes that avoid the drastic reduction in the channels count, when modified for the experiment-2 above. These modifications will primarily be in "res_proj_variant" as well. Hoping to complete them soon and commit them here. Fingers crossed!!
*** Work in progress ***
*** Work in progress ***