UHD

Ultra-lightweight human detection. The number of parameters does not correlate to inference speed. For limited use cases, an input image resolution of 64x64 is sufficient. High-level object detection architectures such as YOLO are overkill.

Please note that the dataset used to train this model is a custom-created, ultra-high-quality dataset derived from MS-COCO. Therefore, a simple comparison with the Val mAP values of other object detection models is completely meaningless. In particular, please note that the mAP values of other MS-COCO-based models are unnecessarily high and do not accurately assess actual performance.

This model is an experimental implementation and is not suitable for real-time inference using a USB camera, etc.

camera_record_64x64.mp4

Variant-S / w ESE + IoU-aware + ReLU

Input 64x64	Output	Input 64x64	Output

Download all ONNX files at once

sudo apt update && sudo apt install -y gh
gh release download onnx -R PINTO0309/UHD

Models

Legacy models

Click to expand

w/o ESE

Variant	Params	FLOPs	mAP@0.5	Corei9 CPU inference latency	ONNX File size	ONNX	w/o post ONNX
N	1.38 M	0.18 G	0.40343	0.93 ms	5.6 MB	Download	Download
T	3.10 M	0.41 G	0.44529	1.50 ms	12.3 MB	Download	Download
S	5.43 M	0.71 G	0.44945	2.23 ms	21.8 MB	Download	Download
C	8.46 M	1.11 G	0.45005	2.66 ms	33.9 MB	Download	Download
M	12.15 M	1.60 G	0.44875	4.07 ms	48.7 MB	Download	Download
L	21.54 M	2.83 G	0.44686	6.23 ms	86.2 MB	Download	Download

w ESE

Variant	Params	FLOPs	mAP@0.5	Corei9 CPU inference latency	ONNX File size	ONNX	w/o post ONNX
N	1.45 M	0.18 G	0.41018	1.05 ms	5.8 MB	Download	Download
T	3.22 M	0.41 G	0.44130	1.27 ms	12.9 MB	Download	Download
S	5.69 M	0.71 G	0.46612	2.10 ms	22.8 MB	Download	Download
C	8.87 M	1.11 G	0.45095	2.86 ms	35.5 MB	Download	Download
M	12.74 M	1.60 G	0.46502	3.95 ms	51.0 MB	Download	Download
L	22.59 M	2.83 G	0.45787	6.52 ms	90.4 MB	Download	Download

ESE + IoU-aware + Swish

Variant	Params	FLOPs	mAP@0.5	Corei9 CPU inference latency	ONNX File size	ONNX	w/o post ONNX
N	1.60 M	0.20 G	0.42806	1.25 ms	6.5 MB	Download	Download
T	3.56 M	0.45 G	0.46502	1.82 ms	14.3 MB	Download	Download
S	6.30 M	0.79 G	0.47473	2.78 ms	25.2 MB	Download	Download
C	9.81 M	1.23 G	0.46235	3.58 ms	39.3 MB	Download	Download
M	14.09 M	1.77 G	0.46562	5.05 ms	56.4 MB	Download	Download
L	24.98 M	3.13 G	0.47774	7.46 ms	100 MB	Download	Download

ESE + IoU-aware + ReLU

Variant	Params	FLOPs	mAP@0.5	Corei9 CPU inference latency	ONNX File size	ONNX	w/o post ONNX
N	1.60 M	0.20 G	0.40910	0.63 ms	6.4 MB	Download	Download
T	3.56 M	0.45 G	0.44618	1.08 ms	14.3 MB	Download	Download
S	6.30 M	0.79 G	0.45776	1.71 ms	25.2 MB	Download	Download
C	9.81 M	1.23 G	0.45385	2.51 ms	39.3 MB	Download	Download
M	14.09 M	1.77 G	0.47468	3.54 ms	56.4 MB	Download	Download
L	24.98 M	3.13 G	0.46965	6.14 ms	100 MB	Download	Download

ESE + IoU-aware + large-object-branch + ReLU

Variant	Params	FLOPs	mAP@0.5	Corei9 CPU inference latency	ONNX File size	ONNX	w/o post ONNX
N	1.98 M	0.22 G	0.40903	0.77 ms	8.0 MB	Download	Download
T	4.40 M	0.49 G	0.46170	1.40 ms	17.7 MB	Download	Download
S	7.79 M	0.87 G	0.45860	2.30 ms	31.2 MB	Download	Download
C	12.13 M	1.35 G	0.47518	2.83 ms	48.6 MB	Download	Download
M	17.44 M	1.94 G	0.45816	4.37 ms	69.8 MB	Download	Download
L	30.92 M	3.44 G	0.48243	7.40 ms	123.7 MB	Download	Download

[For long distances and extremely small objects] ESE + IoU-aware + ReLU + Distillation

Variant	Params	FLOPs	mAP@0.5	Corei9 CPU inference latency	ONNX File size	ONNX	w/o post ONNX
N	1.60 M	0.20 G	0.55224	0.63 ms	6.4 MB	Download	Download
T	3.56 M	0.45 G	0.56040	1.08 ms	14.3 MB	Download	Download
S	6.30 M	0.79 G	0.57361	1.71 ms	25.2 MB	Download	Download
C	9.81 M	1.23 G	0.56183	2.51 ms	39.3 MB	Download	Download
M	14.09 M	1.77 G	0.57666	3.54 ms	56.4 MB	Download	Download

[For short/medium distance] ESE + IoU-aware + large-object-branch + ReLU + Distillation

Variant	Params	FLOPs	mAP@0.5	Corei9 CPU inference latency	ONNX File size	ONNX	w/o post ONNX
N	1.98 M	0.22 G	0.54883	0.70 ms	8.0 MB	Download	Download
T	4.40 M	0.49 G	0.55663	1.18 ms	17.7 MB	Download	Download
S	7.79 M	0.87 G	0.57397	1.97 ms	31.2 MB	Download	Download
C	12.13 M	1.35 G	0.56768	2.74 ms	48.6 MB	Download	Download
M	17.44 M	1.94 G	0.57815	3.57 ms	69.8 MB	Download	Download

torch_bilinear_dynamic + No resizing required + Not suitable for quantization

Variant	Params	FLOPs	mAP@0.5	Corei9 CPU inference latency	ONNX File size	ONNX	w/o post ONNX
N	1.98 M	0.22 G	0.55489	0.70 ms	8.0 MB	Download	Download
T	4.40 M	0.49 G	0.57824	1.18 ms	17.7 MB	Download	Download
S	7.79 M	0.87 G	0.58478	1.97 ms	31.2 MB	Download	Download
C	12.13 M	1.35 G	0.58459	2.74 ms	48.6 MB	Download	Download
M	17.44 M	1.94 G	0.59034	3.57 ms	69.8 MB	Download	Download
L	30.92 M	3.44 G	0.58929	7.16 ms	123.7 MB	Download	Download

torch_nearest_dynamic + No resizing required + Suitable for quantization

Variant	Params	FLOPs	mAP@0.5	Corei9 CPU inference latency	ONNX File size	ONNX	w/o post ONNX
N	1.98 M	0.22 G	0.53376	0.70 ms	8.0 MB	Download	Download
T	4.40 M	0.49 G	0.55561	1.18 ms	17.7 MB	Download	Download
S	7.79 M	0.87 G	0.56396	1.97 ms	31.2 MB	Download	Download
C	12.13 M	1.35 G	0.56328	2.74 ms	48.6 MB	Download	Download
M	17.44 M	1.94 G	0.57075	3.57 ms	69.8 MB	Download	Download
L	30.92 M	3.44 G	0.56787	7.16 ms	123.7 MB	Download	Download

opencv_inter_nearest + Optimized for OpenCV RGB downsampling + Suitable for quantization

Var	Param	FLOPs	@0.5	Corei9 CPU latency	ONNX size	static	w/o post	dynamic	w/o post
R	0.13 M	0.01 G	0.21230	0.24 ms	863 KB	DL	DL	DL	DL
Y	0.29 M	0.03 G	0.28664	0.28 ms	1.2 MB	DL	DL	DL	DL
Z	0.51 M	0.05 G	0.32722	0.32 ms	2.1 MB	DL	DL	DL	DL
A	0.78 M	0.08 G	0.43661	0.37 ms	3.2 MB	DL	DL	DL	DL
F	1.12 M	0.12 G	0.47942	0.44 ms	4.5 MB	DL	DL	DL	DL
P	1.52 M	0.17 G	0.51094	0.50 ms	6.1 MB	DL	DL	DL	DL
N	1.98 M	0.22 G	0.55003	0.60 ms	8.0 MB	DL	DL	DL	DL
T	2.49 M	0.28 G	0.56550	0.70 ms	10.0 MB	DL	DL	DL	DL
S	3.07 M	0.34 G	0.57015	0.81 ms	12.3 MB	DL	DL	DL	DL
L	30.92 M	3.44 G	0.58399	7.16 ms	123.7 MB	DL	DL	DL	DL

opencv_inter_nearest_yuv422 + Optimized for YUV422 + Suitable for quantization

Variants

R: ront, Y: yocto, Z: zepto, A: atto
F: femto, P: pico, N: nano, T: tiny
S: small, C: compact, M: medium, L: large

YUV422

img_u8 = np.ones([64,64,3], dtype=np.uint8)
yuyv = cv2.cvtColor(img_u8, cv2.COLOR_RGB2YUV_YUYV)
print(yuyv.shape)
(64, 64, 2)

With post-process model

input_name.1: input_yuv422 shape: [1, 2, 64, 64] dtype: float32

output_name.1: score_classid_cxcywh shape: [1, 100, 6] dtype: float32

Without post-process model

input_name.1: input_yuv422 shape: [1, 2, 64, 64] dtype: float32

output_name.1: txtywh_obj_quality_cls_x8 shape: [1, 56, 8, 8] dtype: float32
output_name.2: anchors shape: [8, 2] dtype: float32
output_name.3: wh_scale shape: [8, 2] dtype: float32

https://github.com/PINTO0309/UHD/blob/e0bbfe69afa0da4f83cf1f09b530a500bcd2d685/demo_uhd.py#L203-L301

score = sigmoid(obj) * (sigmoid(quality)) * sigmoid(cls)
cx = (sigmoid(tx)+gx)/w
cy = (sigmoid(ty)+gy)/h
bw = anchor_w*softplus(tw)*wh_scale
bh = anchor_h*softplus(th)*wh_scale
boxes = (cx±bw/2, cy±bh/2)

ONNX

Var	Param	FLOPs	@0.5	Corei9 CPU latency	ONNX size	static	w/o post	dynamic	w/o post
R	0.13 M	0.01 G	0.22382	0.34 ms	863 KB	DL	DL	DL	DL
Y	0.29 M	0.03 G	0.29606	0.38 ms	1.2 MB	DL	DL	DL	DL
Z	0.51 M	0.05 G	0.36843	0.43 ms	2.1 MB	DL	DL	DL	DL
A	0.78 M	0.08 G	0.42872	0.48 ms	3.2 MB	DL	DL	DL	DL
F	1.12 M	0.12 G	0.49098	0.54 ms	4.5 MB	DL	DL	DL	DL
P	1.52 M	0.17 G	0.52665	0.63 ms	6.1 MB	DL	DL	DL	DL
N	1.98 M	0.22 G	0.54942	0.70 ms	8.0 MB	DL	DL	DL	DL
T	2.49 M	0.28 G	0.56300	0.83 ms	10.0 MB	DL	DL	DL	DL
S	3.07 M	0.34 G	0.57338	0.91 ms	12.3 MB	DL	DL	DL	DL
L	30.92 M	3.44 G	0.58642	7.16 ms	123.7 MB	DL	DL	DL	DL

Input image 480x360 -> OpenCV INTER_NEAREST -> 64x64 -> YUV422 (packed: YUY2/YUYV)

100% 800% zoom
Y detection sample
F detection sample
N detection sample
S detection sample

ESPDL INT8 (.espdl, .info, .json, anchors.npy, wh_scale.npy)

I don't own an ESP32, so I haven't checked its operation.

Var	ESPDL size	static w/o post s3	static w/o post p4
R	222.8 KB	DL	DL
Y	389.0 KB	DL	DL
Z	617.4 KB	DL	DL
A	911.6 KB	DL	DL
F	1.2 MB	DL	DL
P	1.6 MB	DL	DL
N	2.1 MB	DL	DL
T	2.6 MB	DL	DL
S	3.2 MB	DL	DL

opencv_inter_nearest_y + Optimized for Y (Luminance) only + Suitable for quantization

ONNX

Var	Param	FLOPs	@0.5	CPU latency	ONNX size
R	0.13 M	0.01 G		0.34 ms	863 KB
Y	0.29 M	0.03 G		0.38 ms	1.2 MB
Z	0.51 M	0.05 G		0.43 ms	2.1 MB
A	0.78 M	0.08 G		0.48 ms	3.2 MB
F	1.12 M	0.12 G		0.54 ms	4.5 MB
P	1.52 M	0.17 G		0.63 ms	6.1 MB
N	1.98 M	0.22 G		0.70 ms	8.0 MB
T	2.49 M	0.28 G		0.83 ms	10.0 MB
S	3.07 M	0.34 G		0.91 ms	12.3 MB
L	30.92 M	3.44 G	0.58164	7.16 ms	123.7 MB

ESPDL INT8 (.espdl, .info, .json, anchors.npy, wh_scale.npy)

Var ESPDL size static w/o post
s3 static w/o post
p4

R 222.8 KB

Y 389.0 KB

Z 617.4 KB

A 911.6 KB

F 1.2 MB

P 1.6 MB

N 2.1 MB

T 2.6 MB

S 3.2 MB

Inference

Caution

If you preprocess your images and resize them to 64x64 with OpenCV or similar, use Nearest mode.

Click to expand

usage: demo_uhd.py
[-h]
(--images IMAGES | --camera CAMERA)
--onnx ONNX
[--output OUTPUT]
[--img-size IMG_SIZE]
[--conf-thresh CONF_THRESH]
[--record RECORD]
[--actual-size]
[--use-nms]
[--nms-iou NMS_IOU]

UltraTinyOD ONNX demo (CPU).

options:
  -h, --help
   show this help message and exit
  --images IMAGES
   Directory with images to run batch inference.
  --camera CAMERA
   USB camera id for realtime inference.
  --onnx ONNX
   Path to ONNX model (CPU).
  --output OUTPUT
   Output directory for image mode.
  --img-size IMG_SIZE
   Input size HxW, e.g., 64x64.
  --conf-thresh CONF_THRESH
   Confidence threshold. Default: 0.90
  --record RECORD
   MP4 path for automatic recording when --camera is used.
  --actual-size
   Display and recording use the model input resolution instead of
   the original frame size.
  --use-nms
   Apply Non-Maximum Suppression on decoded boxes (default IoU=0.8).
  --nms-iou NMS_IOU
   IoU threshold for NMS (effective only when --use-nms is set).

ONNX with post-processing

uv run demo_uhd.py \
--onnx ultratinyod_res_anc8_w192_64x64_loese_distill.onnx \
--camera 0 \
--conf-thresh 0.90 \
--use-nms \
--actual-size

ONNX without post-processing

uv run demo_uhd.py \
--onnx ultratinyod_res_anc8_w192_64x64_loese_distill_nopost.onnx \
--camera 0 \
--conf-thresh 0.90 \
--use-nms \
--actual-size

ONNX with pre-processing (PIL equivalent of Resize) + post-processing

uv run demo_uhd.py \
--onnx ultratinyod_res_anc8_w256_64x64_torch_bilinear_dynamic.onnx \
--camera 0 \
--conf-thresh 0.90 \
--use-nms \
--actual-size

ONNX with pre-processing (PIL equivalent of Resize) + without post-processing

uv run demo_uhd.py \
--onnx ultratinyod_res_anc8_w256_64x64_torch_bilinear_dynamic_nopost.onnx \
--camera 0 \
--conf-thresh 0.90 \
--use-nms \
--actual-size

Training Examples (full CLI)

UltraTinyOD (anchor-only, stride 8; --cnn-width controls stem width):

use-improved-head

SIZE=64x64
ANCHOR=8
CNNWIDTH=64
LR=0.005
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-head

SIZE=64x64
ANCHOR=8
CNNWIDTH=96
LR=0.004
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-head

SIZE=64x64
ANCHOR=8
CNNWIDTH=128
LR=0.003
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-head

SIZE=64x64
ANCHOR=8
CNNWIDTH=160
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-head

SIZE=64x64
ANCHOR=8
CNNWIDTH=192
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-head

SIZE=64x64
ANCHOR=8
CNNWIDTH=256
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-head

use-improved-head + utod-head-ese

SIZE=64x64
ANCHOR=8
CNNWIDTH=64
LR=0.005
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-head \
--utod-head-ese

SIZE=64x64
ANCHOR=8
CNNWIDTH=96
LR=0.004
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-head \
--utod-head-ese

SIZE=64x64
ANCHOR=8
CNNWIDTH=128
LR=0.003
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-head \
--utod-head-ese

SIZE=64x64
ANCHOR=8
CNNWIDTH=160
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-head \
--utod-head-ese

SIZE=64x64
ANCHOR=8
CNNWIDTH=192
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-head \
--utod-head-ese

SIZE=64x64
ANCHOR=8
CNNWIDTH=256
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-head \
--utod-head-ese

use-improved-head + use-iou-aware-head + utod-head-ese

SIZE=64x64
ANCHOR=8
CNNWIDTH=64
LR=0.005
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_iou_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese

SIZE=64x64
ANCHOR=8
CNNWIDTH=96
LR=0.004
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_iou_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese

SIZE=64x64
ANCHOR=8
CNNWIDTH=128
LR=0.003
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_iou_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese

SIZE=64x64
ANCHOR=8
CNNWIDTH=160
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_iou_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese

SIZE=64x64
ANCHOR=8
CNNWIDTH=192
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_iou_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese

SIZE=64x64
ANCHOR=8
CNNWIDTH=256
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_iou_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese

use-improved-head + use-iou-aware-head + utod-head-ese + distillation

SIZE=64x64
ANCHOR=8
CNNWIDTH=64
LR=0.0003
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR}_relu_distill \
--ckpt runs/4/ultratinyod_res_anc8_w64_se_iou_64x64_quality_lr0.005_relu/best_utod_0299_map_0.40910.pt \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--classes 0 \
--cnn-width 64 \
--auto-anchors \
--num-anchors 8 \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema --ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--teacher-ckpt runs/4/ultratinyod_res_anc8_w256_se_iou_64x64_quality_lr0.001_relu/best_utod_0300_map_0.46965.pt \
--teacher-arch ultratinyod \
--distill-kl 1.0 \
--distill-box-l1 1.0 \
--distill-feat 0.5 \
--distill-temperature 2.0 \
--distill-cosine


SIZE=64x64
ANCHOR=8
CNNWIDTH=96
LR=0.0003
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR}_relu_distill \
--ckpt runs/3/ultratinyod_res_anc8_w96_se_iou_64x64_quality_lr0.004/best_utod_0293_map_0.46502.pt \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--classes 0 \
--cnn-width 64 \
--auto-anchors \
--num-anchors 8 \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema --ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--teacher-ckpt runs/4/ultratinyod_res_anc8_w256_se_iou_64x64_quality_lr0.001_relu/best_utod_0300_map_0.46965.pt \
--teacher-arch ultratinyod \
--distill-kl 1.0 \
--distill-box-l1 1.0 \
--distill-feat 0.5 \
--distill-temperature 2.0 \
--distill-cosine

SIZE=64x64
ANCHOR=8
CNNWIDTH=128
LR=0.0003
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR}_relu_distill \
--ckpt runs/4/ultratinyod_res_anc8_w128_se_iou_64x64_quality_lr0.003_relu/best_utod_0293_map_0.45776.pt \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--classes 0 \
--cnn-width 64 \
--auto-anchors \
--num-anchors 8 \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema --ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--teacher-ckpt runs/4/ultratinyod_res_anc8_w256_se_iou_64x64_quality_lr0.001_relu/best_utod_0300_map_0.46965.pt \
--teacher-arch ultratinyod \
--distill-kl 1.0 \
--distill-box-l1 1.0 \
--distill-feat 0.5 \
--distill-temperature 2.0 \
--distill-cosine

SIZE=64x64
ANCHOR=8
CNNWIDTH=160
LR=0.0001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR}_relu_distill \
--ckpt runs/4/ultratinyod_res_anc8_w160_se_iou_64x64_quality_lr0.001_relu/best_utod_0300_map_0.45385.pt \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--classes 0 \
--cnn-width 64 \
--auto-anchors \
--num-anchors 8 \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema --ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--teacher-ckpt runs/4/ultratinyod_res_anc8_w256_se_iou_64x64_quality_lr0.001_relu/best_utod_0300_map_0.46965.pt \
--teacher-arch ultratinyod \
--distill-kl 1.0 \
--distill-box-l1 1.0 \
--distill-feat 0.5 \
--distill-temperature 2.0 \
--distill-cosine

SIZE=64x64
ANCHOR=8
CNNWIDTH=192
LR=0.0001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR}_relu_distill \
--ckpt runs/4/ultratinyod_res_anc8_w192_se_iou_64x64_quality_lr0.001_relu/best_utod_0300_map_0.47468.pt \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--classes 0 \
--cnn-width 64 \
--auto-anchors \
--num-anchors 8 \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema --ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--teacher-ckpt runs/4/ultratinyod_res_anc8_w256_se_iou_64x64_quality_lr0.001_relu/best_utod_0300_map_0.46965.pt \
--teacher-arch ultratinyod \
--distill-kl 1.0 \
--distill-box-l1 1.0 \
--distill-feat 0.5 \
--distill-temperature 2.0 \
--distill-cosine

use-improved-head + use-iou-aware-head + utod-head-ese + utod-large-obj-branch

SIZE=64x64
ANCHOR=8
CNNWIDTH=64
LR=0.005
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_loese_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--utod-large-obj-branch \
--utod-large-obj-depth 2 \
--utod-large-obj-ch-scale 1.25

SIZE=64x64
ANCHOR=8
CNNWIDTH=96
LR=0.004
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_loese_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--utod-large-obj-branch \
--utod-large-obj-depth 2 \
--utod-large-obj-ch-scale 1.25

SIZE=64x64
ANCHOR=8
CNNWIDTH=128
LR=0.003
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_loese_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--utod-large-obj-branch \
--utod-large-obj-depth 2 \
--utod-large-obj-ch-scale 1.25

SIZE=64x64
ANCHOR=8
CNNWIDTH=160
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_loese_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--utod-large-obj-branch \
--utod-large-obj-depth 2 \
--utod-large-obj-ch-scale 1.25

ANCHOR=8
CNNWIDTH=192
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_loese_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--utod-large-obj-branch \
--utod-large-obj-depth 2 \
--utod-large-obj-ch-scale 1.25

SIZE=64x64
ANCHOR=8
CNNWIDTH=256
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_loese_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--utod-large-obj-branch \
--utod-large-obj-depth 2 \
--utod-large-obj-ch-scale 1.25

use-improved-head + use-iou-aware-head + utod-head-ese + utod-large-obj-branch + distillation

SIZE=64x64
ANCHOR=8
CNNWIDTH=64
LR=0.0003
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR}_relu_distill \
--ckpt runs/6/ultratinyod_res_anc8_w64_loese_64x64_quality_lr0.005/best_utod_0296_map_0.40903.pt \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--classes 0 \
--cnn-width 64 \
--auto-anchors \
--num-anchors 8 \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema --ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--teacher-ckpt runs/6/ultratinyod_res_anc8_w256_loese_64x64_quality_lr0.001/best_utod_0179_map_0.48243.pt \
--teacher-arch ultratinyod \
--distill-kl 1.0 \
--distill-box-l1 1.0 \
--distill-feat 0.5 \
--distill-temperature 2.0 \
--distill-cosine

SIZE=64x64
ANCHOR=8
CNNWIDTH=96
LR=0.0003
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR}_relu_distill \
--ckpt runs/6/ultratinyod_res_anc8_w96_loese_64x64_quality_lr0.004/best_utod_0279_map_0.46170.pt \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--classes 0 \
--cnn-width 64 \
--auto-anchors \
--num-anchors 8 \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema --ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--teacher-ckpt runs/6/ultratinyod_res_anc8_w256_loese_64x64_quality_lr0.001/best_utod_0179_map_0.48243.pt \
--teacher-arch ultratinyod \
--distill-kl 1.0 \
--distill-box-l1 1.0 \
--distill-feat 0.5 \
--distill-temperature 2.0 \
--distill-cosine

SIZE=64x64
ANCHOR=8
CNNWIDTH=128
LR=0.0003
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR}_relu_distill \
--ckpt runs/6/ultratinyod_res_anc8_w128_loese_64x64_quality_lr0.003/best_utod_0259_map_0.45860.pt \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--classes 0 \
--cnn-width 64 \
--auto-anchors \
--num-anchors 8 \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema --ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--teacher-ckpt runs/6/ultratinyod_res_anc8_w256_loese_64x64_quality_lr0.001/best_utod_0179_map_0.48243.pt \
--teacher-arch ultratinyod \
--distill-kl 1.0 \
--distill-box-l1 1.0 \
--distill-feat 0.5 \
--distill-temperature 2.0 \
--distill-cosine

SIZE=64x64
ANCHOR=8
CNNWIDTH=160
LR=0.0001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR}_relu_distill \
--ckpt runs/6/ultratinyod_res_anc8_w160_loese_64x64_quality_lr0.001/best_utod_0210_map_0.47518.pt \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--classes 0 \
--cnn-width 64 \
--auto-anchors \
--num-anchors 8 \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema --ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--teacher-ckpt runs/6/ultratinyod_res_anc8_w256_loese_64x64_quality_lr0.001/best_utod_0179_map_0.48243.pt \
--teacher-arch ultratinyod \
--distill-kl 1.0 \
--distill-box-l1 1.0 \
--distill-feat 0.5 \
--distill-temperature 2.0 \
--distill-cosine

SIZE=64x64
ANCHOR=8
CNNWIDTH=192
LR=0.0001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR}_relu_distill \
--ckpt runs/4/ultratinyod_res_anc8_w192_se_iou_64x64_quality_lr0.001_relu/best_utod_0300_map_0.47468.pt \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--classes 0 \
--cnn-width 64 \
--auto-anchors \
--num-anchors 8 \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema --ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--teacher-ckpt runs/6/ultratinyod_res_anc8_w256_loese_64x64_quality_lr0.001/best_utod_0179_map_0.48243.pt \
--teacher-arch ultratinyod \
--distill-kl 1.0 \
--distill-box-l1 1.0 \
--distill-feat 0.5 \
--distill-temperature 2.0 \
--distill-cosine

SIZE=64x64
ANCHOR=8
CNNWIDTH=256
LR=0.0001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR}_relu_distill \
--ckpt runs/4/ultratinyod_res_anc8_w256_se_iou_64x64_quality_lr0.001_relu/best_utod_0300_map_0.46965.pt \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--classes 0 \
--cnn-width 64 \
--auto-anchors \
--num-anchors 8 \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema --ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--teacher-ckpt runs/6/ultratinyod_res_anc8_w256_loese_64x64_quality_lr0.001/best_utod_0179_map_0.48243.pt \
--teacher-arch ultratinyod \
--distill-kl 1.0 \
--distill-box-l1 1.0 \
--distill-feat 0.5 \
--distill-temperature 2.0 \
--distill-cosine

Validation-only Example

Click to expand

Example of running only validation on a trained checkpoint:

uv run python train.py \
--arch ultratinyod \
--img-size 64x64 \
--cnn-width 256 \
--classes 0 \
--conf-thresh 0.15 \
--ckpt runs/ultratinyod_res_anc8_w256_64x64_lr0.0003/best_utod_0297_map_0.44299.pt \
--val-only \
--use-ema

CLI parameters

Click to expand

Parameter	Description	Default
`--arch`	Model architecture: `cnn`, `transformer`, or anchor-only `ultratinyod`.	`cnn`
`--image-dir`	Directory containing images and YOLO txt labels.	`data/wholebody34/obj_train_data`
`--train-split`	Fraction of data used for training.	`0.8`
`--val-split`	Fraction of data used for validation.	`0.2`
`--img-size`	Input size `HxW` (e.g., `64x64`).	`64x64`
`--resize-mode`	Resize mode for training preprocessing: `torch_bilinear`, `torch_nearest`, `opencv_inter_linear`, `opencv_inter_nearest`, `opencv_inter_nearest_y`, `opencv_inter_nearest_yuv422`.	`torch_bilinear`
`--torch_bilinear`	Shortcut for `--resize-mode torch_bilinear`.	`False`
`--torch_nearest`	Shortcut for `--resize-mode torch_nearest`.	`False`
`--opencv_inter_linear`	Shortcut for `--resize-mode opencv_inter_linear`.	`False`
`--opencv_inter_nearest`	Shortcut for `--resize-mode opencv_inter_nearest`.	`False`
`--opencv_inter_nearest_y`	Shortcut for `--resize-mode opencv_inter_nearest_y`.	`False`
`--opencv_inter_nearest_yuv422`	Shortcut for `--resize-mode opencv_inter_nearest_yuv422`.	`False`
`--exp-name`	Experiment name; logs saved under `runs/<exp-name>`.	`default`
`--batch-size`	Batch size.	`64`
`--epochs`	Number of epochs.	`100`
`--resume`	Checkpoint to resume training (loads optimizer/scheduler).	`None`
`--ckpt`	Initialize weights from checkpoint (no optimizer state).	`None`
`--ckpt-non-strict`	Load `--ckpt` with `strict=False` (ignore missing/unexpected keys).	`False`
`--val-only`	Run validation only with `--ckpt` or `--resume` weights and exit.	`False`
`--val-count`	Limit number of validation images when using `--val-only`.	`None`
`--use-improved-head`	UltraTinyOD only: enable quality-aware head (IoU-aware obj, IoU score branch, learnable WH scale, extra context).	`False`
`--use-iou-aware-head`	UltraTinyOD head: task-aligned IoU-aware scoring (quality*cls) with split towers.	`False`
`--quality-power`	Exponent for quality score when using IoU-aware head scoring.	`1.0`
`--teacher-ckpt`	Teacher checkpoint path for distillation.	`None`
`--teacher-arch`	Teacher architecture override.	`None`
`--teacher-num-queries`	Teacher DETR queries.	`None`
`--teacher-d-model`	Teacher model dimension.	`None`
`--teacher-heads`	Teacher attention heads.	`None`
`--teacher-layers`	Teacher encoder/decoder layers.	`None`
`--teacher-dim-feedforward`	Teacher FFN dimension.	`None`
`--teacher-use-skip`	Force teacher skip connections on.	`False`
`--teacher-activation`	Teacher activation (`relu`/`swish`).	`None`
`--teacher-use-fpn`	Force teacher FPN on.	`False`
`--teacher-backbone`	Teacher backbone checkpoint for feature distillation.	`None`
`--teacher-backbone-arch`	Teacher backbone architecture hint.	`None`
`--teacher-backbone-norm`	Teacher backbone input normalization.	`imagenet`
`--distill-kl`	KL distillation weight (transformer).	`0.0`
`--distill-box-l1`	Box L1 distillation weight (transformer).	`0.0`
`--distill-cosine`	Cosine ramp-up of distillation weights.	`False`
`--distill-temperature`	Teacher logits temperature.	`1.0`
`--distill-feat`	Feature-map distillation weight (CNN only).	`0.0`
`--lr`	Learning rate.	`0.001`
`--weight-decay`	Weight decay.	`0.0001`
`--optimizer`	Optimizer (`adamw` or `sgd`).	`adamw`
`--grad-clip-norm`	Global gradient norm clip; set `0` to disable.	`5.0`
`--num-workers`	DataLoader workers.	`8`
`--device`	Device: `cuda` or `cpu`.	`cuda` if available
`--seed`	Random seed.	`42`
`--log-interval`	Steps between logging to progress bar.	`10`
`--eval-interval`	Epoch interval for evaluation.	`1`
`--conf-thresh`	Confidence threshold for decoding.	`0.3`
`--topk`	Top-K for CNN decoding.	`50`
`--use-amp`	Enable automatic mixed precision.	`False`
`--aug-config`	YAML for augmentations (applied in listed order).	`uhd/aug.yaml`
`--use-ema`	Enable EMA of model weights for evaluation/checkpointing.	`False`
`--ema-decay`	EMA decay factor (ignored if EMA disabled).	`0.9998`
`--coco-eval`	Run COCO-style evaluation.	`False`
`--coco-per-class`	Log per-class COCO AP when COCO eval is enabled.	`False`
`--classes`	Comma-separated target class IDs.	`0`
`--activation`	Activation function (`relu` or `swish`).	`swish`
`--cnn-width`	Width multiplier for CNN backbone.	`32`
`--backbone`	Optional lightweight CNN backbone (`microcspnet`, `ultratinyresnet`, `enhanced-shufflenet`, or `none`).	`None`
`--backbone-channels`	Comma-separated channels for `ultratinyresnet` (e.g., `16,32,48,64`).	`None`
`--backbone-blocks`	Comma-separated residual block counts per stage for `ultratinyresnet` (e.g., `1,2,2,1`).	`None`
`--backbone-se`	Apply SE/eSE on backbone output (custom backbones only).	`none`
`--backbone-skip`	Add long skip fusion across custom backbone stages (ultratinyresnet).	`False`
`--backbone-skip-cat`	Use concat+1x1 fusion for long skips (ultratinyresnet); implies `--backbone-skip`.	`False`
`--backbone-skip-shuffle-cat`	Use stride+shuffle concat fusion for long skips (ultratinyresnet); implies `--backbone-skip`.	`False`
`--backbone-skip-s2d-cat`	Use space-to-depth concat fusion for long skips (ultratinyresnet); implies `--backbone-skip`.	`False`
`--backbone-fpn`	Enable a tiny FPN fusion inside custom backbones (ultratinyresnet).	`False`
`--backbone-out-stride`	Override custom backbone output stride (e.g., `8` or `16`).	`None`
`--use-skip`	Enable skip-style fusion in the CNN head (sums pooled shallow features into the final stage). Stored in checkpoints and restored on resume.	`False`
`--utod-residual`	Enable residual skips inside the UltraTinyOD backbone.	`False`
`--utod-head-ese`	UltraTinyOD head: apply lightweight eSE on shared features.	`False`
`--utod-context-rfb`	UltraTinyOD head: add a receptive-field block (dilated + wide depthwise) before prediction layers.	`False`
`--utod-context-dilation`	Dilation used in UltraTinyOD receptive-field block (only when `--utod-context-rfb`).	`2`
`--utod-large-obj-branch`	UltraTinyOD head: add a downsampled large-object refinement branch (no FPN).	`False`
`--utod-large-obj-depth`	Number of depthwise blocks in the large-object branch (only when `--utod-large-obj-branch`).	`2`
`--utod-large-obj-ch-scale`	Channel scale for the large-object branch (relative to head channels).	`1.0`
`--use-anchor`	Use anchor-based head for CNN (YOLO-style).	`False`
`--output-stride`	Final CNN feature stride (downsample factor). Supported: `4`, `8`, `16`.	`16`
`--anchors`	Anchor sizes as normalized `w,h` pairs (space separated).	`""`
`--auto-anchors`	Compute anchors from training labels when using anchor head.	`False`
`--num-anchors`	Number of anchors to use when auto-computing.	`3`
`--iou-loss`	IoU loss type for anchor head (`iou`, `giou`, or `ciou`).	`giou`
`--anchor-assigner`	Anchor assigner strategy (`legacy`, `simota`).	`legacy`
`--anchor-cls-loss`	Anchor classification loss (`bce`, `vfl`).	`bce`
`--simota-topk`	Top-K IoUs for dynamic-k in SimOTA.	`10`
`--last-se`	Apply SE/eSE only on the last CNN block.	`none`
`--use-batchnorm`	Enable BatchNorm layers during training/export.	`False`
`--last-width-scale`	Channel scale for last CNN block (e.g., `1.25`).	`1.0`
`--num-queries`	Transformer query count.	`10`
`--d-model`	Transformer model dimension.	`64`
`--heads`	Transformer attention heads.	`4`
`--layers`	Transformer encoder/decoder layers.	`3`
`--dim-feedforward`	Transformer feedforward dimension.	`128`
`--use-fpn`	Enable simple FPN for transformer backbone.	`False`

Tiny CNN backbones (--backbone, optional; default keeps the original built-in CNN):

microcspnet: CSP-tiny style stem (16/32/64/128) compressed to 64ch, stride 8 output.
ultratinyresnet: 16→24→32→48 channel ResNet-like stack with three downsample steps (stride 8). Channel widths and blocks per stage can be overridden via --backbone-channels / --backbone-blocks; optional long skips across stages via --backbone-skip; optional lightweight FPN fusion via --backbone-fpn.
enhanced-shufflenet: Enhanced ShuffleNetV2+ inspired (arXiv:2111.00902) with progressive widening and doubled refinements, ending at ~128ch, stride 8. All custom backbones can optionally apply SE/eSE on the backbone output via --backbone-se {none,se,ese}.

Augmentation via YAML

Click to expand

Specify a YAML file with --aug-config to run the data_augment: entries in the listed order (e.g., --aug-config uhd/aug.yaml).
Supported ops (examples): Mosaic / MixUp / CopyPaste / HorizontalFlip (class_swap_map supported) / VerticalFlip / RandomScale / Translation / RandomCrop / RandomResizedCrop / RandomBrightness / RandomContrast / RandomSaturation / RandomHSV / RandomPhotometricDistort / Blur / MedianBlur / MotionBlur / GaussianBlur / GaussNoise / ImageCompression / ISONoise / RandomRain / RandomFog / RandomSunFlare / CLAHE / ToGray / RemoveOutliers.
If prob is provided, it is used as the apply probability; otherwise defaults are used (most are 0, RandomPhotometricDistort defaults to 0.5). Unknown keys are ignored.

Loss terms (CNN / CenterNet)

Click to expand

loss: total loss (hm + off + wh)
hm: focal loss on center heatmap
off: L1 loss on center offsets (within-cell quantization correction)
wh: L1 loss on width/height (feature-map scale)

Loss terms (CNN / Anchor head, `--use-anchor`)

Click to expand

loss: total anchor loss (box + obj + cls [+ quality] when --use-improved-head)
obj: BCE on objectness for each anchor location (positive vs. background)
cls: BCE on per-class logits for positive anchors (one-hot over target classes)
box: (1 - IoU/GIoU/CIoU) on decoded boxes for positive anchors; IoU flavor set by --iou-loss
quality (improved head only): BCE on IoU-linked quality logit; obj targetもIoUでスケールされる

Loss terms (Transformer)

Click to expand

loss: total loss (cls + l1 + iou)
cls: cross-entropy for class vs. background
l1: L1 loss on box coordinates
iou: 1 - IoU for matched predictions

The impact of image downsampling methods

Click to expand

PyTorch's Resize method is implemented using a downsampling method similar to PIL on the backend, but it is significantly different from OpenCV's downsampling implementation. Therefore, when downsampling images in preprocessing during training, it is important to note that the numerical characteristics of the images used by the model for training will be completely different depending on whether you use PyTorch's Resize method or OpenCV's Resize method. Below is the pixel-level error calculation when downsampling an image to 64x64 pixels. If the diff value is greater than 1.0, the images are completely different.

Therefore, it is easy to imagine that if the downsampling method used for preprocessing during learning is different from the downsampling method used during inference, the output inference results will be disastrous.

The internal workings of PyTorch's downsampling and PIL's downsampling are very similar but slightly different. When deploying and inferencing in Python and other environments, accuracy will be significantly degraded unless the model is deployed according to the following criteria. If you train using OpenCV's cv2.INTER_LINEAR, the model will never produce the correct output after preprocessing in PyTorch, TensorFlow, or ONNX other than OpenCV.

Training	Deploy
When training while downsampling using PyTorch's Resize (`InterpolationMode.BILINEAR`)	Merge `Resize Linear + half-pixel` at the input of the ONNX model. This will result in the highest model accuracy. However, it will be limited to deployment on hardware, NPUs, and frameworks that support the resize operation of bilinear interpolation. It is not suitable for quantization.
When training while downsampling using PyTorch's Resize (`InterpolationMode.NEAREST`)	Merge `Resize Nearest` at the input of the ONNX model. It is the most versatile in terms of HW, NPU, and quantization deployment, but the accuracy of the model will be lower.
When training while downsampling using OpenCV's Resize (`cv2.INTER_NEAREST`)	Merge `Resize Nearest` at the input of the ONNX model or OpenCV `INTER_NEAREST`. Although the accuracy is low, it is highly versatile because the downsampling of images can be freely written on the program side. However, downsampling must be implemented manually.

Error after PIL conversion when downsampling with PyTorch's Resize InterpolationMode.BILINEAR

PyTorch(InterpolationMode.BILINEAR) -> Convert to PIL vs PyTorch Tensor(InterpolationMode.BILINEAR)
  max  diff : 1
  mean diff : 0.4949
  std  diff : 0.5000

Error when downsampling with PyTorch's Resize InterpolationMode.BILINEAR compared to downsampling with OpenCV's INTER_LINER

PyTorch(InterpolationMode.BILINEAR) -> Convert to PIL vs OpenCV INTER_LINEAR
  max  diff : 104
  mean diff : 10.2930
  std  diff : 13.2792

Error when downsampling with PyTorch's Resize InterpolationMode.BILINEAR compared to downsampling with OpenCV's INTER_LINER

PyTorch Tensor(InterpolationMode.BILINEAR) vs OpenCV INTER_LINEAR
  max  diff : 104
  mean diff : 10.3336
  std  diff : 13.2463

Accuracy and speed of each interpolation method when downsampling in OpenCV
- Accuracy: INTER_NEAREST < INTER_LINEAR < INTER_AREA, Speed: INTER_NEAREST > INTER_LINEAR > INTER_AREA
```
=== Resize benchmark ===
INTER_NEAREST : 0.0061 ms
INTER_LINEAR  : 0.0143 ms
INTER_AREA    : 0.3621 ms
AREA / LINEAR ratio : 25.40x
```

ONNX export

Click to expand

Export a checkpoint to ONNX (auto-detects arch from checkpoint unless overridden):

SIZE=64x64
ANCHOR=8
CNNWIDTH=64
RESIZEMODE=opencv_inter_nearest_y
CKPT=runs/ultratinyod_res_anc8_w64_loese_64x64_lr0.005_impaug/best_utod_0001_map_0.00000.pt
uv run python export_onnx.py \
--checkpoint ${CKPT} \
--output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static.onnx \
--opset 17
uv run python export_onnx.py \
--checkpoint ${CKPT} \
--output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic.onnx \
--opset 17 \
--dynamic-resize
uv run python export_onnx.py \
--checkpoint ${CKPT} \
--output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static_nopost.onnx \
--opset 17 \
--no-merge-postprocess
uv run python export_onnx.py \
--checkpoint ${CKPT} \
--output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic_nopost.onnx \
--opset 17 \
--no-merge-postprocess \
--dynamic-resize

SIZE=64x64
ANCHOR=8
CNNWIDTH=96
RESIZEMODE=opencv_inter_nearest_y
CKPT=runs/ultratinyod_res_anc8_w96_loese_64x64_lr0.005_impaug/best_utod_0001_map_0.00000.pt
uv run python export_onnx.py \
--checkpoint ${CKPT} \
--output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static.onnx \
--opset 17
uv run python export_onnx.py \
--checkpoint ${CKPT} \
--output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic.onnx \
--opset 17 \
--dynamic-resize
uv run python export_onnx.py \
--checkpoint ${CKPT} \
--output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static_nopost.onnx \
--opset 17 \
--no-merge-postprocess
uv run python export_onnx.py \
--checkpoint ${CKPT} \
--output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic_nopost.onnx \
--opset 17 \
--no-merge-postprocess \
--dynamic-resize

SIZE=64x64
ANCHOR=8
CNNWIDTH=128
RESIZEMODE=opencv_inter_nearest_y
CKPT=runs/ultratinyod_res_anc8_w128_loese_64x64_lr0.005_impaug/best_utod_0001_map_0.00000.pt
uv run python export_onnx.py \
--checkpoint ${CKPT} \
--output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static.onnx \
--opset 17
uv run python export_onnx.py \
--checkpoint ${CKPT} \
--output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic.onnx \
--opset 17 \
--dynamic-resize
uv run python export_onnx.py \
--checkpoint ${CKPT} \
--output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static_nopost.onnx \
--opset 17 \
--no-merge-postprocess
uv run python export_onnx.py \
--checkpoint ${CKPT} \
--output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic_nopost.onnx \
--opset 17 \
--no-merge-postprocess \
--dynamic-resize

SIZE=64x64
ANCHOR=8
CNNWIDTH=160
RESIZEMODE=opencv_inter_nearest_y
CKPT=runs/ultratinyod_res_anc8_w160_loese_64x64_lr0.005_impaug/best_utod_0001_map_0.00000.pt
uv run python export_onnx.py \
--checkpoint ${CKPT} \
--output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static.onnx \
--opset 17
uv run python export_onnx.py \
--checkpoint ${CKPT} \
--output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic.onnx \
--opset 17 \
--dynamic-resize
uv run python export_onnx.py \
--checkpoint ${CKPT} \
--output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static_nopost.onnx \
--opset 17 \
--no-merge-postprocess
uv run python export_onnx.py \
--checkpoint ${CKPT} \
--output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic_nopost.onnx \
--opset 17 \
--no-merge-postprocess \
--dynamic-resize

SIZE=64x64
ANCHOR=8
CNNWIDTH=192
RESIZEMODE=opencv_inter_nearest_y
CKPT=runs/ultratinyod_res_anc8_w192_loese_64x64_lr0.005_impaug/best_utod_0001_map_0.00000.pt
uv run python export_onnx.py \
--checkpoint ${CKPT} \
--output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static.onnx \
--opset 17
uv run python export_onnx.py \
--checkpoint ${CKPT} \
--output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic.onnx \
--opset 17 \
--dynamic-resize
uv run python export_onnx.py \
--checkpoint ${CKPT} \
--output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static_nopost.onnx \
--opset 17 \
--no-merge-postprocess
uv run python export_onnx.py \
--checkpoint ${CKPT} \
--output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic_nopost.onnx \
--opset 17 \
--no-merge-postprocess \
--dynamic-resize

SIZE=64x64
ANCHOR=8
CNNWIDTH=256
RESIZEMODE=opencv_inter_nearest_y
CKPT=runs/ultratinyod_res_anc8_w256_loese_64x64_lr0.005_impaug/best_utod_0001_map_0.00000.pt
uv run python export_onnx.py \
--checkpoint ${CKPT} \
--output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static.onnx \
--opset 17
uv run python export_onnx.py \
--checkpoint ${CKPT} \
--output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic.onnx \
--opset 17 \
--dynamic-resize
uv run python export_onnx.py \
--checkpoint ${CKPT} \
--output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static_nopost.onnx \
--opset 17 \
--no-merge-postprocess
uv run python export_onnx.py \
--checkpoint ${CKPT} \
--output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic_nopost.onnx \
--opset 17 \
--no-merge-postprocess \
--dynamic-resize

LiteRT (TFLite) quantization

Click to expand

uv run onnx2tf \
-i ultratinyod_res_anc8_w64_64x64_quality_relu_nopost.onnx \
-cotof \
-oiqt

ESP-DL Quantization

This repository includes a calibration/quantization script for ESP-DL: uhd/quantize_onnx_model_for_esp32.py.

Image-only calibration (default)

Click to expand

uv run python uhd/quantize_onnx_model_for_esp32.py \
--dataset-type image \
--image-dir data/wholebody34/obj_train_data \
--resize-mode opencv_inter_nearest_yuv422 \
--onnx-model ultratinyod_res_anc8_w16_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.onnx \
--espdl-model ultratinyod_res_anc8_w16_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.espdl \
--target "esp32s3" \
--calib-algorithm kl \
--int16-op-pattern "/model/backbone/block1/dw/conv/Conv" \
--int16-op-pattern "/model/head/context_res/context_res.2/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.0/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.1/dw/conv/Conv"

uv run python uhd/quantize_onnx_model_for_esp32.py \
--dataset-type image \
--image-dir data/wholebody34/obj_train_data \
--resize-mode opencv_inter_nearest_yuv422 \
--onnx-model ultratinyod_res_anc8_w24_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.onnx \
--espdl-model ultratinyod_res_anc8_w24_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.espdl \
--target esp32s3 \
--calib-algorithm kl \
--int16-op-pattern "/model/head/context_res/context_res.0/dw/conv/Conv" \
--int16-op-pattern "/model/head/box_tower/box_tower.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/quality_tower/quality_tower.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.0/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.1/dw/conv/Conv"

uv run python uhd/quantize_onnx_model_for_esp32.py \
--dataset-type image \
--image-dir data/wholebody34/obj_train_data \
--resize-mode opencv_inter_nearest_yuv422 \
--onnx-model ultratinyod_res_anc8_w32_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.onnx \
--espdl-model ultratinyod_res_anc8_w32_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.espdl \
--target esp32s3 \
--calib-algorithm kl \
--int16-op-pattern "/model/head/context_res/context_res.0/dw/conv/Conv" \
--int16-op-pattern "/model/head/context_res/context_res.2/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.0/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/box_tower/box_tower.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/quality_tower/quality_tower.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/obj_conv/dw/conv/Conv"

uv run python uhd/quantize_onnx_model_for_esp32.py \
--dataset-type image \
--image-dir data/wholebody34/obj_train_data \
--resize-mode opencv_inter_nearest_yuv422 \
--onnx-model ultratinyod_res_anc8_w40_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.onnx \
--espdl-model ultratinyod_res_anc8_w40_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.espdl \
--target esp32s3 \
--calib-algorithm kl \
--int16-op-pattern "/model/backbone/block1/dw/conv/Conv" \
--int16-op-pattern "/model/head/context_res/context_res.2/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_down/large_obj_down.0/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.0/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/box_tower/box_tower.0/dw/conv/Conv" \
--int16-op-pattern "/model/head/obj_conv/dw/conv/Conv" \
--int16-op-pattern "/model/head/quality_tower/quality_tower.0/dw/conv/Conv" \
--int16-op-pattern "/model/head/box_tower/box_tower.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/quality_tower/quality_tower.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/context_res/context_res.0/dw/conv/Conv"

uv run python uhd/quantize_onnx_model_for_esp32.py \
--dataset-type image \
--image-dir data/wholebody34/obj_train_data \
--resize-mode opencv_inter_nearest_yuv422 \
--onnx-model ultratinyod_res_anc8_w48_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.onnx \
--espdl-model ultratinyod_res_anc8_w48_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.espdl \
--target esp32s3 \
--calib-algorithm kl \
--int16-op-pattern "/model/backbone/block1/dw/conv/Conv" \
--int16-op-pattern "/model/head/context/dw/conv/Conv" \
--int16-op-pattern "/model/head/context_res/context_res.2/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.0/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/box_tower/box_tower.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/quality_tower/quality_tower.1/dw/conv/Conv"

uv run python uhd/quantize_onnx_model_for_esp32.py \
--dataset-type image \
--image-dir data/wholebody34/obj_train_data \
--resize-mode opencv_inter_nearest_yuv422 \
--onnx-model ultratinyod_res_anc8_w56_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.onnx \
--espdl-model ultratinyod_res_anc8_w56_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.espdl \
--target esp32s3 \
--calib-algorithm kl \
--int16-op-pattern "/model/head/context_res/context_res.0/dw/conv/Conv" \
--int16-op-pattern "/model/head/context_res/context_res.2/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/box_tower/box_tower.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/quality_tower/quality_tower.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.0/dw/conv/Conv"

uv run python uhd/quantize_onnx_model_for_esp32.py \
--dataset-type image \
--image-dir data/wholebody34/obj_train_data \
--resize-mode opencv_inter_nearest_yuv422 \
--onnx-model ultratinyod_res_anc8_w64_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.onnx \
--espdl-model ultratinyod_res_anc8_w64_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.espdl \
--target esp32s3 \
--calib-algorithm kl \
--int16-op-pattern "/model/head/context_res/context_res.2/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.0/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/box_tower/box_tower.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/quality_tower/quality_tower.1/dw/conv/Conv"

uv run python uhd/quantize_onnx_model_for_esp32.py \
--dataset-type image \
--image-dir data/wholebody34/obj_train_data \
--resize-mode opencv_inter_nearest_yuv422 \
--onnx-model ultratinyod_res_anc8_w72_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.onnx \
--espdl-model ultratinyod_res_anc8_w72_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.espdl \
--target esp32s3 \
--calib-algorithm kl \
--int16-op-pattern "/model/head/context_res/context_res.2/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/box_tower/box_tower.1/dw/conv/Conv"

Notes:

The YUV422 models expect input shape [1, 2, 64, 64] with opencv_inter_nearest_yuv422 preprocessing.
--dataset-type image is the default and ignores labels.
Adjust --calib-steps, --batch-size, --target, --num-of-bits, and --device as needed.

CLI options

Click to expand

--image-dir: Directory containing calibration images.
--dataset-type: Calibration dataset type (image or yolo, default image).
--list-path: Optional text file listing images to use.
--export-anchors-wh-scale-dir: Directory to save {onnx-model}_anchors.npy and {onnx-model}_wh_scale.npy (default: same directory as --espdl-model).
--expand-group-conv: Expand groups > 1 conv into group=1 (default: disabled).
--img-size: Square input size used for calibration (default 64).
--resize-mode: Resize mode (default opencv_inter_nearest_yuv422).
--class-ids: Comma-separated class IDs to keep (yolo only, default 0).
--split: Dataset split for calibration (train, val, all, default all).
--val-split: Validation split ratio (ignored when --split all, default 0.0).
--batch-size: Calibration batch size (default 1).
--calib-steps: Number of calibration steps (default 32).
--calib-algorithm: Calibration algorithm (default kl; examples: minmax, mse, percentile).
--int16-op-pattern: Regex pattern to force matched ops to int16 (repeatable).
--onnx-model: Path to the input ONNX model (default ultratinyod_res_anc8_w16_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.onnx).
--espdl-model: Path to the output .espdl file (default ultratinyod_res_anc8_w16_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.espdl).
--target: Quantize target type (c, esp32s3, esp32p4, default esp32s3).
--num-of-bits: Quantization bits (default 8).
--device: Device for calibration (cpu or cuda, default cpu).

Arch

Click to expand

ONNX	LiteRT(TFLite)

Ultra-lightweight classification model series

VSDLM: Visual-only speech detection driven by lip movements - MIT License
OCEC: Open closed eyes classification. Ultra-fast wink and blink estimation model - MIT License
PGC: Ultrafast pointing gesture classification - MIT License
SC: Ultrafast sitting classification - MIT License
PUC: Phone Usage Classifier is a three-class image classification pipeline for understanding how people interact with smartphones - MIT License
HSC: Happy smile classifier - MIT License
WHC: Waving Hand Classification - MIT License
UHD: Ultra-lightweight human detection - MIT License

Citation

If you find this project useful, please consider citing:

@software{hyodo2025uhd,
  author    = {Katsuya Hyodo},
  title     = {PINTO0309/UHD},
  month     = {12},
  year      = {2025},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.17790207},
  url       = {https://github.com/PINTO0309/uhd},
  abstract  = {Ultra-lightweight human detection. The number of parameters does not correlate to inference speed.},
}

Name		Name	Last commit message	Last commit date
Latest commit History 380 Commits
ckpts		ckpts
uhd		uhd
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
compare_torchvision_cv2_resize.py		compare_torchvision_cv2_resize.py
compare_wpost_wopost.py		compare_wpost_wopost.py
demo_uhd.py		demo_uhd.py
export_onnx.py		export_onnx.py
profile_onnx.py		profile_onnx.py
pyproject.toml		pyproject.toml
train.py		train.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

UHD

Download all ONNX files at once

Models

Inference

Training Examples (full CLI)

Validation-only Example

CLI parameters

Augmentation via YAML

Loss terms (CNN / CenterNet)

Loss terms (CNN / Anchor head, `--use-anchor`)

Loss terms (Transformer)

The impact of image downsampling methods

ONNX export

LiteRT (TFLite) quantization

ESP-DL Quantization

Image-only calibration (default)

CLI options

Arch

Ultra-lightweight classification model series

Citation

Ref

About

Uh oh!

Releases 1

Packages

Languages

License

PINTO0309/UHD

Folders and files

Latest commit

History

Repository files navigation

UHD

Download all ONNX files at once

Models

Inference

Training Examples (full CLI)

Validation-only Example

CLI parameters

Augmentation via YAML

Loss terms (CNN / CenterNet)

Loss terms (CNN / Anchor head, --use-anchor)

Loss terms (Transformer)

The impact of image downsampling methods

ONNX export

LiteRT (TFLite) quantization

ESP-DL Quantization

Image-only calibration (default)

CLI options

Arch

Ultra-lightweight classification model series

Citation

Ref

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Loss terms (CNN / Anchor head, `--use-anchor`)

Packages