Skip to content

lwnn/lwnn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

270 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LWNN - Lightweight Neural Network

Build Status

Mostly, inspired by NNOM, CMSIS-NN, I want to do something for Edge AI.

But as I think NNOM is not well designed for different runtime, CPU/DSP/GPU/NPU etc, it doesn't have a clear path to handle different type of runtime, and nowdays, I really want to study somehing about OpenCL, and I came across MACE, and I find there is a bunch of CL kernels can be used directly.

So I decieded to do something meaningfull, do some study of OpenCL and at the meantime to create a Lightweight Neural Network that can be suitale for decices such as PC, mobiles and MCU etc.

Architecture

And for the purpose to support variant Deep Learning frameworks such as tensorflow/keras/caffe2, pytorch etc, the onnx will be supported by lwnn, also for some old frameworks such as caffe/darknet that doesn't support onnx, they are supported by special handling.

arch

Layers/Runtime cpu float cpu s8 cpu q8 cpu q16 opencl comments
Conv1D Y d Y Y Y Y based on Conv2D
Conv2D Y d Y Y Y Y
DeConv2D Y Y Y Y Y
DepthwiseConv2D Y Y Y Y Y
DilatedConv2D Y N N N Y
EltmentWise Max Y d Y Y Y Y
ReLU Y d Y Y Y Y
PReLU Y d N N N Y
MaxPool1D Y d Y Y Y Y based on MaxPool2D
MaxPool2D Y d Y Y Y Y
Dense Y Y Y Y Y
Softmax Y d Y Y Y Y
Reshape Y d Y Y Y Y
Pad Y Y Y Y Y
BatchNorm Y Y Y Y Y
Concat Y Y Y Y Y
AvgPool1D Y d Y Y Y Y based on AvgPool2D
AvgPool2D Y d Y Y Y Y
Add Y d Y Y Y Y
PriorBox Y N N N F
DetectionOutput Y F F F F
Upsample Y Y Y Y Y
Yolo Y F F F F
YoloOutput Y F F F F
Mfcc Y F F F F
LSTM Y N Y N F
Proposal Y N N N N
Mul Y d N N N Y
  • F means fallback to others runtime that supported that layer.
  • d means dynamic shape support
  • s8/q8/q16: all are in Q Format
  • s8: 8 bit symmetric quantization with zero offset, very similar to tflite quantization
  • q8/q16: 8/16 bit symmetric quantization, no zero offset.
  • q8/s8/q16 activation(ReLU/Clip) will reuse its input layer's buffer, so the activation layer's input layer must has only one consumer that is itself.

Supported Famous Models

Below is a list of command to run above models on OPENCL or CPU runtime.

# objection detection
lwnn_gtest --gtest_filter=*CL*SSDFloat -i images/dog.jpg
lwnn_gtest --gtest_filter=*CPU*SSDFloat -i images/dog.jpg
lwnn_gtest --gtest_filter=*CL*YOLOV3Float -i images/dog.jpg
lwnn_gtest --gtest_filter=*CPU*YOLOV3Float -i images/dog.jpg
lwnn_gtest --gtest_filter=*CPU*MASKRCNNFloat -i images/dog.jpg
# semantic segmentation
lwnn_gtest --gtest_filter=*CL*ENETFloat -i ENet/example_image/munich_000000_000019_leftImg8bit.png
lwnn_gtest --gtest_filter=*CPU*ENETFloat -i ENet/example_image/munich_000000_000019_leftImg8bit.png
# speech to text
lwnn_gtest --gtest_filter=*CPU*DSFloat -i speech_dataset/bird/042ea76c_nohash_0.wav
stt 49/29:                                 b  irr  d

Note: Those models has big accuracy drop when do quantization, I think quantization awareness training or something like TensorRT calibration is necessary.

Development

prepare environment

conda create -n lwnn python=3.6
source activate lwnn
conda install scons 
pip install tensorflow keras keras2onnx onnxruntime
sudo apt install nvidia-opencl-dev

build

scons

About

Lightweight Neural Network

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors