- spnn: simple parallelized neural network.
- A comparison of fully connected network (forward and backward propagation) implementations.
- Implementations are listed below,
- CPU single thread.
- CPU multiple threads using openmp.
- GPU single thread using cuda.
- GPU multiple threads using cuda.
- OpenBLAS.
- The task selected is digit classification on MNIST data.
- Code is written in
C++/CUDA. - OpenMP variant uses
openmplibrary. - OpenBLAS variant uses
openblaslibrary. include/contains headers.src/contains all variant implementations.data/contains MNIST data.proposal.pdfcontains the project proposal.presentation.pdfcontains the presentation given at the end of the project.report.pdfcontains details, experiments and analysis.Makefileis used to make target executables.
- The documentation of the code is itself.
- Open a terminal in the directory containing
Makefile. - Use
make allto build all targets. - The targets are listed as follows,
cpu_serial.outcuda_parallel.outopenmp.outopenblas.outcuda_serial.out
- To build a specific target use
make <target-name>. - To remove all targets use
make clean. - Use
./<target-name>to run a target.
- Accuracy vs epochs for the fully connected network irrepective of implementation.
- Implementations comparision for a specfic model.
- Time taken vs Params size for different implementaions. Observe the GPU parallelized variant curve is flat at almost 0.
- Things to consider during analysis
- correctness (> 10% accuracy)
- repeatablity (nothing fancy)
- memory check (no mem leaks or other bad stuff using valgrind --tool=memcheck)
- Initialization done uniformly in -1 to 1
- Layers are numbered from 0 i.e. first hidden layer is layer 1
- Control size of name field
- Impl loss function
- Remove memleaks from step_train
- Batch gradient descent: fix loss decrement and check backprop
- Normalize
- Get MNIST data
- Profile
- Remove data loading from time taken