This embedded machine learning library is a workspace to explore embedding computer vision machine learning algorithms. Its scope is the full supervised machine learning workflow - (acquire, annotate, train, test, optimize, deploy, validate). mllib employs a microservices arctitecture.
Mllib image segmetation using the unet network and trained using the coco dataset is shown below. The left image illustrates a human segmented validation image and right shows the results of Tensorflow training and conversion to TensorRT Float16 inference on Jetson NX. This project explores the process of creating and transforming models to performed machine learned impage processing on embedded hardware.
mllib is currently a sandbox to explore ideas and techniques. As such, it is a useful location to experiment with new techniques. It is not a stable repository with consistent interfaces.
The mllib toolset includes: Tensorflow 2, Keras, Jupyter, TensorRT, TensorflowLite, Visual Studio Code, Docker, airflow, and Kubernetes. The target embedded hardware includes Jetson AGX, Jetson NX, Google Corel, and Raspberry Pi
mllib directories define specific steps to perform supervised machine learning. The README.md within these subdirectories describe how to perform each step. This include:
- datasets: dataset processing algorithms
- networks: convolutional nural networks (CNN) used by other image processing algorithms
- classify: algorithms to train and test classification networks
- segment: algorithms to train and test segmentation networks
- target: instructions in scripts to prepare and target PC, Jetson, Corel, and Raspberry Pi boards
- serve: model inference on target platforms
- utils: shared utility libraries
Embedded machine learned image processing is an emerging field. Although science fiction and its close cousin, press headlines, build the impression that the area is stable and mature, this is far from the case. Consiquently, the process I recommend is very quick development cycles moving algorithms quickly from devlopment to test on the target platform. In this past, moving algorithms from a development environment to embedded hardware would have involved a complete rewrite of software in an new runtime environment, all of these target systems run linux, can execute pythion algorithms in vitualized docker environments with hardware access to powerful machine learning coprocessors. Consiquently, my embedded machine learning process is to create an algorithm and with little or no training or optimization, move it and test it in the embedded environment. This verifies the entire tool chain can handle the model structure. Once any targeting problems are rectivied, model training and optimization is improved and target performance is verified.
To target Jetson boards, I am using the Tensorflow->ONNX->TensorRT path. For Corel and Raspberry Pi, I am following the TensorFlow->Tensorflow Lite path.
The basics of a embedded ML imaging environment include a development workstation, network, embedded device and webcam.
- I prefer a deep learning workstation (e.g. lambda workstation) rather than cloud for development. This will host your development environment, datasets, and provide objet storage to share data between the development and embedded environments. This can be built up from a current workstation or purchased as a complete sytem.
- The key component of a deep learning workstaiton is the GPU where training and inference is performed. If there were not a global semiconductior shortage, a variety of gaming to professional graphics cards are available at a wide price point. At this writing Titan RTX with 24GB memory for big models and batch sizes would be a good choice. This is a moving target and will take some research.
- After GPU, system memory is my biggest bottleneck to keeping the GPU working efficiently. A rule of thumb I have followed is 2x system meomory to GPU memory ( e.g. 48 GB RAM for a 24GB Titan RTX memory).
- Next is storage for big datasets. My preference is a 10TB 3.5" HDD for lots of storage at a moderate cost in addition to an NVME drive for runtime cashing.
- What about CPUs? Choose one that enables the fastes PCIe and memory speed and can keep up with the non GPU pre and post processing. I typically choose a lower-cost CPU that maximizes my communication speeds.
- A USB webcam is a flexiable and fun image source. I learn a lot by interacting live with ML algorithms that I wouldn't on saved image sets (low light, high contrast, saturation, etc). I use the lhe Logitech C920 because it is supported in Windows, Linux, Jetson, Corel.io and Raspberry Pi through OpenCV.
- Ubuntu linux distribution
- Visual Studio Code is a great free development environment
- Python is the primarly language for ML development and of this project
- Docker defines and runs the runtime environment for developing and targeting embedded environments. All code in this project is run within a docker environment.
- MINIO s3 object storage stores and distributes machine learning data between embedded devices, servers, and develelopment PCs.
- Jetson NX is a capabile target platform for machine learned image processing if you are choosing a target platform.
On the development workstation:
- Setup Ubuntu desktop
- Install the current nvidia drivers
sudo apt update
sudo apt upgrade -ySetup ssh server:
sudo apt install openssh-server -y
sudo systemctl status ssh
sudo ufw allow ssh
sudo ufw enable && sudo ufw reload- Configure ssh access key (windows)
- On your development computer, generate a public/private key. Accept default parameters.
ssh-keygen - Open C:\Users\username\.ssh\id_rsa.pub
- Copy all file contents
- In the container, past the contents of id_rsa.pub to ~/.ssh/authorized_keys in linux.
blarson@0c7a556d4a53:~$ mkdir ~/.ssh blarson@0c7a556d4a53:~$ nano ~/.ssh/authorized_keys
- Press ctl+o to save the authorized_keys once you have pasted in the new key.
- Pres ctl+x to exit to close nano.
- On your development computer, generate a public/private key. Accept default parameters.
Problems with sudo ubuntu-drivers autoinstall. Move to install a specific version. Sorry that this document will quickly become out of data ubuntu-drivers "UnboundLocalError: local variable 'version' referenced before assignment" when installing nvidia drivers For Ubuntu 22.04, Had the same issue today. Fixed it by editing the /usr/lib/python3/dist-packages/UbuntuDrivers/detect.py" file and replace line 835 with this line:
version = int(package_name.split('-')[-2])ubuntu-drivers devices
sudo apt install nvidia-driver-525 -y
sudo reboot nowOnce the computer has restarted, test that the nvidia driver is installed and running:
nvidia-smi- Install docker
sudo apt install ca-certificates curl gnupg lsb-release -y
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install docker-ce docker-ce-cli containerd.io docker-compose-plugin -y
sudo groupadd docker
sudo usermod -aG docker $USER
newgrp docker
docker run hello-worldSet up nvidia-docker
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo apt-get install -y nvidia-docker2
sudo systemctl restart dockerTest:
sudo docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smiResponse (dependent on GPUs)
Sat Feb 11 17:31:23 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.78.01 Driver Version: 525.78.01 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro RTX 6000 Off | 00000000:15:00.0 On | Off |
| 33% 24C P8 18W / 260W | 522MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+- Install microk8s kubernetes
You may encounter the following within a vscode terminal: "/snap/microk8s/4565/bin/sed: couldn't flush stdout: Permission denied", Execute these instructions within a stand-alone bash terminal.
To install microk8s:
sudo snap remove microk8s --purge
sudo snap install microk8s --channel=1.22/stable --classic
sudo usermod -a -G microk8s $USER
newgrp microk8s
sudo chown -f -R $USER ~/.kube
su - $USER
microk8s status --wait-ready
microk8s enable gpu helm3 storage registry
#microk8s enable dns gpu helm3 storage registry rbac ingress metallb:10.64.140.43-10.64.140.143
#sudo snap install kubectl --classic
#cd $HOME
#mkdir .kube
#cd .kube
#microk8s config > configecho 'alias py=python3' >> ~/.bashrc
echo 'alias kc=microk8s.kubectl' >> ~/.bashrc
echo 'alias helm=microk8s.helm3' >> ~/.bashrc
. ~/.bashrc-
Create a minio object storage
-
Load the mllib project. From the command prompt:
sudo mkdir /data
sudo chown $USER /data
mkdir /data/git
cd /data/git
git https://github.com/bhlarson/mllib.git- Let's Encrypt wildcard certificae
- Kubernetes secret: Generate TLS Secret for kubernetes
- Base 64 encode certificate
cat cert.pem | base64 | awk 'BEGIN{ORS="";} {print}' > tls.crt
cat privkey.pem | base64 | awk 'BEGIN{ORS="";} {print}' > tls.key- create a credentials file mllib/creds.json defining S3 access crediantials. It should have the the strucure below. Replace the "<>" values with the values of your object storage
{
"s3":[
{"name":"mllib-s3",
"type":"trainer", "address":"<s3 url>",
"access key":"<s3 access key>",
"secret key":"<s3 secret key>",
"tls":true,
"cert_verify":false,
"cert_path": null,
"sets":{
"dataset":{"bucket":"mllib","prefix":"data", "dataset_filter":"" },
"trainingset":{"bucket":"mllib","prefix":"training", "dataset_filter":"" },
"model":{"bucket":"mllib","prefix":"model", "dataset_filter":"dl3" },
"test":{"bucket":"mllib","prefix":"test", "dataset_filter":"" }
}
}
]
}For 480 height, 512 width images, the following table shows the UNET accuracy, similarity, and inference time:
| Software | Hardware | Images | Accuracy | Similarity | Inference time(s) |
|---|---|---|---|---|---|
| Tensorflow Foat32 | X86-64 RTX 6000 | 5000.0 | 0.947432 | 0.668267 | 0.076956 |
| Onnx Foat32 | X86-64 RTX 6000 | 5000.0 | 0.947474 | 0.667503 | 0.153856 |
| TensorRT Foat16 | X86-64 RTX 6000 | 5000.0 | 0.947155 | 0.667532 | 0.008323 |
| Tensorflow Foat32 | Jetson AGX | 5000.0 | 0.945176 | 0.668743 | 0.231636 |
| TensorRT Foat16 | Jetson AGX | 5000.0 | 0.945007 | 0.665993 | 0.029665 |
| Tensorflow Foat32 | Jetson NX | 5000.0 | 0.941661 | 0.666916 | 0.370289 |
| TensorRT Foat16 | Jetson NX | 5000.0 | 0.946283 | 0.668803 | 0.046575 |
- Import Embedded Classification to classify
- Instructions to setup development environment
- Instructions to use mllib
- Jupyter examples
Setup Microk8s snap install microk8s Setup Kubectl snap install kubectl Working with kubectl
- Handwritten note from VS code Draw Note: