You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The main idea of DeepShift is to test the ability to train and infer using bitwise shifts.
We present 2 approaches:
DeepShift-Q: the parameters are floating point weights just like regular networks, but the weights are rounded to powers of 2 during the forward and backward passes
DeepShift-PS: the parameters are signs and shift values
Important Notes
To train from scratch, the learning rate --lr option should be set to 0.01. To train from pre-trained model, it should be set to 0.001 and lr-step-size should be set to 5
To use DeepShift-PS, the --optimizer must be set to radam in order to obtain good results.
(Needs to be done every time you run code) Source the environment:
source venv/bin/activate
Install required packages and build the spfpm package for fixed point
pip install -r requirements.txt
cd into pytorch directroy:
cd pytorch
Now you can run the different scripts with different options, e.g.,
a) Train a DeepShift simple fully-connected model on the MNIST dataset, using the PS apprach:
Run the installation script to install our CPU and CUDA kernels that perform matrix multiplication and convolution using bit-wise shifts:
sh install_kernels.sh
Now you can run a model with acutal bit-wise shift kernels in CUDA using the --use-kernel True option. Remember that the kernel only works for inference not training, so you need to add the -e option as well: