Embedded Machine Learning Library (mllib)

This embedded machine learning library is a workspace to explore embedding computer vision machine learning algorithms. Its scope is the full supervised machine learning workflow - (acquire, annotate, train, test, optimize, deploy, validate). mllib employs a microservices arctitecture.

Mllib image segmetation using the unet network and trained using the coco dataset is shown below. The left image illustrates a human segmented validation image and right shows the results of Tensorflow training and conversion to TensorRT Float16 inference on Jetson NX. This project explores the process of creating and transforming models to performed machine learned impage processing on embedded hardware.

mllib is currently a sandbox to explore ideas and techniques. As such, it is a useful location to experiment with new techniques. It is not a stable repository with consistent interfaces.

The mllib toolset includes: Tensorflow 2, Keras, Jupyter, TensorRT, TensorflowLite, Visual Studio Code, Docker, airflow, and Kubernetes. The target embedded hardware includes Jetson AGX, Jetson NX, Google Corel, and Raspberry Pi

Repository Structure

mllib directories define specific steps to perform supervised machine learning. The README.md within these subdirectories describe how to perform each step. This include:

datasets: dataset processing algorithms
networks: convolutional nural networks (CNN) used by other image processing algorithms
classify: algorithms to train and test classification networks
segment: algorithms to train and test segmentation networks
target: instructions in scripts to prepare and target PC, Jetson, Corel, and Raspberry Pi boards
serve: model inference on target platforms
utils: shared utility libraries

Process

Embedded machine learned image processing is an emerging field. Although science fiction and its close cousin, press headlines, build the impression that the area is stable and mature, this is far from the case. Consiquently, the process I recommend is very quick development cycles moving algorithms quickly from devlopment to test on the target platform. In this past, moving algorithms from a development environment to embedded hardware would have involved a complete rewrite of software in an new runtime environment, all of these target systems run linux, can execute pythion algorithms in vitualized docker environments with hardware access to powerful machine learning coprocessors. Consiquently, my embedded machine learning process is to create an algorithm and with little or no training or optimization, move it and test it in the embedded environment. This verifies the entire tool chain can handle the model structure. Once any targeting problems are rectivied, model training and optimization is improved and target performance is verified.

To target Jetson boards, I am using the Tensorflow->ONNX->TensorRT path. For Corel and Raspberry Pi, I am following the TensorFlow->Tensorflow Lite path.

Development Environment

The basics of a embedded ML imaging environment include a development workstation, network, embedded device and webcam.

I prefer a deep learning workstation (e.g. lambda workstation) rather than cloud for development. This will host your development environment, datasets, and provide objet storage to share data between the development and embedded environments. This can be built up from a current workstation or purchased as a complete sytem.
The key component of a deep learning workstaiton is the GPU where training and inference is performed. If there were not a global semiconductior shortage, a variety of gaming to professional graphics cards are available at a wide price point. At this writing Titan RTX with 24GB memory for big models and batch sizes would be a good choice. This is a moving target and will take some research.
After GPU, system memory is my biggest bottleneck to keeping the GPU working efficiently. A rule of thumb I have followed is 2x system meomory to GPU memory ( e.g. 48 GB RAM for a 24GB Titan RTX memory).
Next is storage for big datasets. My preference is a 10TB 3.5" HDD for lots of storage at a moderate cost in addition to an NVME drive for runtime cashing.
What about CPUs? Choose one that enables the fastes PCIe and memory speed and can keep up with the non GPU pre and post processing. I typically choose a lower-cost CPU that maximizes my communication speeds.
A USB webcam is a flexiable and fun image source. I learn a lot by interacting live with ML algorithms that I wouldn't on saved image sets (low light, high contrast, saturation, etc). I use the lhe Logitech C920 because it is supported in Windows, Linux, Jetson, Corel.io and Raspberry Pi through OpenCV.
Ubuntu linux distribution
Visual Studio Code is a great free development environment
Python is the primarly language for ML development and of this project
Docker defines and runs the runtime environment for developing and targeting embedded environments. All code in this project is run within a docker environment.
MINIO s3 object storage stores and distributes machine learning data between embedded devices, servers, and develelopment PCs.
Jetson NX is a capabile target platform for machine learned image processing if you are choosing a target platform.

Setup

On the development workstation:

Setup Ubuntu desktop
Install the current nvidia drivers

sudo apt update
sudo apt upgrade -y

Setup ssh server:

sudo apt install openssh-server -y
sudo systemctl status ssh
sudo ufw allow ssh
sudo ufw enable && sudo ufw reload

Configure ssh access key (windows)
1. On your development computer, generate a public/private key. Accept default parameters.
```
ssh-keygen
```
2. Open C:\Users\username\.ssh\id_rsa.pub
3. Copy all file contents
4. In the container, past the contents of id_rsa.pub to ~/.ssh/authorized_keys in linux.
```
blarson@0c7a556d4a53:~$ mkdir ~/.ssh
blarson@0c7a556d4a53:~$ nano ~/.ssh/authorized_keys
```
5. Press ctl+o to save the authorized_keys once you have pasted in the new key.
6. Pres ctl+x to exit to close nano.

Problems with sudo ubuntu-drivers autoinstall. Move to install a specific version. Sorry that this document will quickly become out of data ubuntu-drivers "UnboundLocalError: local variable 'version' referenced before assignment" when installing nvidia drivers For Ubuntu 22.04, Had the same issue today. Fixed it by editing the /usr/lib/python3/dist-packages/UbuntuDrivers/detect.py" file and replace line 835 with this line:

version = int(package_name.split('-')[-2])

ubuntu-drivers devices
sudo apt install nvidia-driver-525 -y
sudo reboot now

Once the computer has restarted, test that the nvidia driver is installed and running:

nvidia-smi

Install docker

sudo apt install ca-certificates curl gnupg lsb-release -y
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo   "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install docker-ce docker-ce-cli containerd.io docker-compose-plugin -y
sudo groupadd docker
sudo usermod -aG docker $USER
newgrp docker
docker run hello-world

Set up nvidia-docker

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
      && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
      && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
            sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
            sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker

Test:

sudo docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi

Response (dependent on GPUs)

Sat Feb 11 17:31:23 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.78.01    Driver Version: 525.78.01    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro RTX 6000     Off  | 00000000:15:00.0  On |                  Off |
| 33%   24C    P8    18W / 260W |    522MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

Install microk8s kubernetes

You may encounter the following within a vscode terminal: "/snap/microk8s/4565/bin/sed: couldn't flush stdout: Permission denied", Execute these instructions within a stand-alone bash terminal.

To install microk8s:

sudo snap remove microk8s --purge
sudo snap install microk8s --channel=1.22/stable --classic
sudo usermod -a -G microk8s $USER
newgrp microk8s
sudo chown -f -R $USER ~/.kube
su - $USER
microk8s status --wait-ready
microk8s enable gpu helm3 storage registry
#microk8s enable dns gpu helm3 storage registry rbac ingress metallb:10.64.140.43-10.64.140.143
#sudo snap install kubectl --classic
#cd $HOME
#mkdir .kube
#cd .kube
#microk8s config > config

echo 'alias py=python3' >> ~/.bashrc
echo 'alias kc=microk8s.kubectl' >> ~/.bashrc
echo 'alias helm=microk8s.helm3' >> ~/.bashrc
. ~/.bashrc

Create a minio object storage
Load the mllib project. From the command prompt:

sudo mkdir /data
sudo chown $USER /data
mkdir /data/git
cd /data/git
git https://github.com/bhlarson/mllib.git

Set-up secure minio repository

Let's Encrypt wildcard certificae
Kubernetes secret: Generate TLS Secret for kubernetes
Base 64 encode certificate

cat cert.pem | base64 | awk 'BEGIN{ORS="";} {print}' > tls.crt
cat privkey.pem | base64 | awk 'BEGIN{ORS="";} {print}' > tls.key

create a credentials file mllib/creds.json defining S3 access crediantials. It should have the the strucure below. Replace the "<>" values with the values of your object storage

{
    "s3":[
        {"name":"mllib-s3", 
            "type":"trainer", "address":"<s3 url>", 
            "access key":"<s3 access key>", 
            "secret key":"<s3 secret key>",
            "tls":true, 
            "cert_verify":false,
            "cert_path": null,
            "sets":{
                "dataset":{"bucket":"mllib","prefix":"data", "dataset_filter":"" },
                "trainingset":{"bucket":"mllib","prefix":"training", "dataset_filter":"" },
                "model":{"bucket":"mllib","prefix":"model", "dataset_filter":"dl3" },
                "test":{"bucket":"mllib","prefix":"test", "dataset_filter":"" }
            } 
         }
    ]
}

Results

Performance

For 480 height, 512 width images, the following table shows the UNET accuracy, similarity, and inference time:

Software	Hardware	Images	Accuracy	Similarity	Inference time(s)
Tensorflow Foat32	X86-64 RTX 6000	5000.0	0.947432	0.668267	0.076956
Onnx Foat32	X86-64 RTX 6000	5000.0	0.947474	0.667503	0.153856
TensorRT Foat16	X86-64 RTX 6000	5000.0	0.947155	0.667532	0.008323
Tensorflow Foat32	Jetson AGX	5000.0	0.945176	0.668743	0.231636
TensorRT Foat16	Jetson AGX	5000.0	0.945007	0.665993	0.029665
Tensorflow Foat32	Jetson NX	5000.0	0.941661	0.666916	0.370289
TensorRT Foat16	Jetson NX	5000.0	0.946283	0.668803	0.046575

Confusion Matrix - Tensorflow Float 32 - X86-64 RTX6000

Confusion Matrix - TensorRT Float 16 - Jetson AGX

Confusion Matrix - TensorRT Float 16 - Jetson NX

To Do:

Import Embedded Classification to classify
Instructions to setup development environment
Instructions to use mllib
Jupyter examples

Notes:

Setup Microk8s snap install microk8s Setup Kubectl snap install kubectl Working with kubectl

Handwritten note from VS code Draw Note:

Name		Name	Last commit message	Last commit date
Latest commit History 705 Commits
airflow		airflow
argo		argo
classify		classify
config		config
context		context
cvat		cvat
datasets		datasets
helm		helm
img		img
jupyter		jupyter
k8s		k8s
nas		nas
networks		networks
notebook		notebook
segment		segment
serve		serve
target		target
test		test
train		train
triton		triton
utils		utils
workflow		workflow
.dockerignore		.dockerignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
TestONNXTRT.ipynb		TestONNXTRT.ipynb
__init__.py		__init__.py
build.json		build.json
dbj-min		dbj-min
dbjt		dbjt
dbtrt		dbtrt
deploy		deploy
djpb		djpb
djpr		djpr
djtb		djtb
djtr		djtr
dockerfile		dockerfile
dockerfile-jetson		dockerfile-jetson
dockerfile-jetson-min		dockerfile-jetson-min
dockerfile-mlflow		dockerfile-mlflow
dockerfile-torch		dockerfile-torch
dockerfile-torch-train		dockerfile-torch-train
dockerfile-triton		dockerfile-triton
dockerfile-workflow		dockerfile-workflow
dockerfile_jupyter		dockerfile_jupyter
dockerfile_tfe		dockerfile_tfe
dockerfile_trt		dockerfile_trt
dr		dr
drb		drb
drj-min		drj-min
drtrt		drtrt
dtb		dtb
dtr		dtr
dts		dts
dtsb		dtsb
install		install
install.json		install.json
lab		lab
requirements.txt		requirements.txt
requirements_train.txt		requirements_train.txt
workflow.py		workflow.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Embedded Machine Learning Library (mllib)

Repository Structure

Process

Development Environment

Setup

Set-up secure minio repository

Results

Performance

Confusion Matrix - Tensorflow Float 32 - X86-64 RTX6000

Confusion Matrix - TensorRT Float 16 - Jetson AGX

Confusion Matrix - TensorRT Float 16 - Jetson NX

To Do:

Notes:

References:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

bhlarson/mllib

Folders and files

Latest commit

History

Repository files navigation

Embedded Machine Learning Library (mllib)

Repository Structure

Process

Development Environment

Setup

Set-up secure minio repository

Results

Performance

Confusion Matrix - Tensorflow Float 32 - X86-64 RTX6000

Confusion Matrix - TensorRT Float 16 - Jetson AGX

Confusion Matrix - TensorRT Float 16 - Jetson NX

To Do:

Notes:

References:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages