Ultra-lightweight human detection. The number of parameters does not correlate to inference speed. For limited use cases, an input image resolution of 64x64 is sufficient. High-level object detection architectures such as YOLO are overkill.
Please note that the dataset used to train this model is a custom-created, ultra-high-quality dataset derived from MS-COCO. Therefore, a simple comparison with the Val mAP values of other object detection models is completely meaningless. In particular, please note that the mAP values of other MS-COCO-based models are unnecessarily high and do not accurately assess actual performance.
This model is an experimental implementation and is not suitable for real-time inference using a USB camera, etc.
camera_record_64x64.mp4
- Variant-S /
w ESE + IoU-aware + ReLU
| Input 64x64 |
Output | Input 64x64 |
Output |
|---|---|---|---|
sudo apt update && sudo apt install -y gh
gh release download onnx -R PINTO0309/UHD-
Legacy models
Click to expand
-
w/o ESE
Variant Params FLOPs mAP@0.5 Corei9 CPU
inference
latencyONNX
File sizeONNX w/o post
ONNXN 1.38 M 0.18 G 0.40343 0.93 ms 5.6 MB Download Download T 3.10 M 0.41 G 0.44529 1.50 ms 12.3 MB Download Download S 5.43 M 0.71 G 0.44945 2.23 ms 21.8 MB Download Download C 8.46 M 1.11 G 0.45005 2.66 ms 33.9 MB Download Download M 12.15 M 1.60 G 0.44875 4.07 ms 48.7 MB Download Download L 21.54 M 2.83 G 0.44686 6.23 ms 86.2 MB Download Download -
w ESE
Variant Params FLOPs mAP@0.5 Corei9 CPU
inference
latencyONNX
File sizeONNX w/o post
ONNXN 1.45 M 0.18 G 0.41018 1.05 ms 5.8 MB Download Download T 3.22 M 0.41 G 0.44130 1.27 ms 12.9 MB Download Download S 5.69 M 0.71 G 0.46612 2.10 ms 22.8 MB Download Download C 8.87 M 1.11 G 0.45095 2.86 ms 35.5 MB Download Download M 12.74 M 1.60 G 0.46502 3.95 ms 51.0 MB Download Download L 22.59 M 2.83 G 0.45787 6.52 ms 90.4 MB Download Download -
ESE + IoU-aware + Swish
Variant Params FLOPs mAP@0.5 Corei9 CPU
inference
latencyONNX
File sizeONNX w/o post
ONNXN 1.60 M 0.20 G 0.42806 1.25 ms 6.5 MB Download Download T 3.56 M 0.45 G 0.46502 1.82 ms 14.3 MB Download Download S 6.30 M 0.79 G 0.47473 2.78 ms 25.2 MB Download Download C 9.81 M 1.23 G 0.46235 3.58 ms 39.3 MB Download Download M 14.09 M 1.77 G 0.46562 5.05 ms 56.4 MB Download Download L 24.98 M 3.13 G 0.47774 7.46 ms 100 MB Download Download -
ESE + IoU-aware + ReLU
Variant Params FLOPs mAP@0.5 Corei9 CPU
inference
latencyONNX
File sizeONNX w/o post
ONNXN 1.60 M 0.20 G 0.40910 0.63 ms 6.4 MB Download Download T 3.56 M 0.45 G 0.44618 1.08 ms 14.3 MB Download Download S 6.30 M 0.79 G 0.45776 1.71 ms 25.2 MB Download Download C 9.81 M 1.23 G 0.45385 2.51 ms 39.3 MB Download Download M 14.09 M 1.77 G 0.47468 3.54 ms 56.4 MB Download Download L 24.98 M 3.13 G 0.46965 6.14 ms 100 MB Download Download -
ESE + IoU-aware + large-object-branch + ReLU
Variant Params FLOPs mAP@0.5 Corei9 CPU
inference
latencyONNX
File sizeONNX w/o post
ONNXN 1.98 M 0.22 G 0.40903 0.77 ms 8.0 MB Download Download T 4.40 M 0.49 G 0.46170 1.40 ms 17.7 MB Download Download S 7.79 M 0.87 G 0.45860 2.30 ms 31.2 MB Download Download C 12.13 M 1.35 G 0.47518 2.83 ms 48.6 MB Download Download M 17.44 M 1.94 G 0.45816 4.37 ms 69.8 MB Download Download L 30.92 M 3.44 G 0.48243 7.40 ms 123.7 MB Download Download -
[For long distances and extremely small objects] ESE + IoU-aware + ReLU + Distillation
Variant Params FLOPs mAP@0.5 Corei9 CPU
inference
latencyONNX
File sizeONNX w/o post
ONNXN 1.60 M 0.20 G 0.55224 0.63 ms 6.4 MB Download Download T 3.56 M 0.45 G 0.56040 1.08 ms 14.3 MB Download Download S 6.30 M 0.79 G 0.57361 1.71 ms 25.2 MB Download Download C 9.81 M 1.23 G 0.56183 2.51 ms 39.3 MB Download Download M 14.09 M 1.77 G 0.57666 3.54 ms 56.4 MB Download Download -
[For short/medium distance] ESE + IoU-aware + large-object-branch + ReLU + Distillation
Variant Params FLOPs mAP@0.5 Corei9 CPU
inference
latencyONNX
File sizeONNX w/o post
ONNXN 1.98 M 0.22 G 0.54883 0.70 ms 8.0 MB Download Download T 4.40 M 0.49 G 0.55663 1.18 ms 17.7 MB Download Download S 7.79 M 0.87 G 0.57397 1.97 ms 31.2 MB Download Download C 12.13 M 1.35 G 0.56768 2.74 ms 48.6 MB Download Download M 17.44 M 1.94 G 0.57815 3.57 ms 69.8 MB Download Download -
torch_bilinear_dynamic+ No resizing required + Not suitable for quantizationVariant Params FLOPs mAP@0.5 Corei9 CPU
inference
latencyONNX
File sizeONNX w/o post
ONNXN 1.98 M 0.22 G 0.55489 0.70 ms 8.0 MB Download Download T 4.40 M 0.49 G 0.57824 1.18 ms 17.7 MB Download Download S 7.79 M 0.87 G 0.58478 1.97 ms 31.2 MB Download Download C 12.13 M 1.35 G 0.58459 2.74 ms 48.6 MB Download Download M 17.44 M 1.94 G 0.59034 3.57 ms 69.8 MB Download Download L 30.92 M 3.44 G 0.58929 7.16 ms 123.7 MB Download Download -
torch_nearest_dynamic+ No resizing required + Suitable for quantizationVariant Params FLOPs mAP@0.5 Corei9 CPU
inference
latencyONNX
File sizeONNX w/o post
ONNXN 1.98 M 0.22 G 0.53376 0.70 ms 8.0 MB Download Download T 4.40 M 0.49 G 0.55561 1.18 ms 17.7 MB Download Download S 7.79 M 0.87 G 0.56396 1.97 ms 31.2 MB Download Download C 12.13 M 1.35 G 0.56328 2.74 ms 48.6 MB Download Download M 17.44 M 1.94 G 0.57075 3.57 ms 69.8 MB Download Download L 30.92 M 3.44 G 0.56787 7.16 ms 123.7 MB Download Download
-
-
opencv_inter_nearest+ Optimized for OpenCV RGB downsampling + Suitable for quantizationVar Param FLOPs @0.5 Corei9
CPU
latencyONNX
sizestatic w/o post dynamic w/o post R 0.13 M 0.01 G 0.21230 0.24 ms 863 KB DL DL DL DL Y 0.29 M 0.03 G 0.28664 0.28 ms 1.2 MB DL DL DL DL Z 0.51 M 0.05 G 0.32722 0.32 ms 2.1 MB DL DL DL DL A 0.78 M 0.08 G 0.43661 0.37 ms 3.2 MB DL DL DL DL F 1.12 M 0.12 G 0.47942 0.44 ms 4.5 MB DL DL DL DL P 1.52 M 0.17 G 0.51094 0.50 ms 6.1 MB DL DL DL DL N 1.98 M 0.22 G 0.55003 0.60 ms 8.0 MB DL DL DL DL T 2.49 M 0.28 G 0.56550 0.70 ms 10.0 MB DL DL DL DL S 3.07 M 0.34 G 0.57015 0.81 ms 12.3 MB DL DL DL DL L 30.92 M 3.44 G 0.58399 7.16 ms 123.7 MB DL DL DL DL -
opencv_inter_nearest_yuv422+ Optimized for YUV422 + Suitable for quantization-
Variants
R: ront, Y: yocto, Z: zepto, A: atto F: femto, P: pico, N: nano, T: tiny S: small, C: compact, M: medium, L: large -
YUV422img_u8 = np.ones([64,64,3], dtype=np.uint8) yuyv = cv2.cvtColor(img_u8, cv2.COLOR_RGB2YUV_YUYV) print(yuyv.shape) (64, 64, 2)
-
With post-process model
input_name.1: input_yuv422 shape: [1, 2, 64, 64] dtype: float32 output_name.1: score_classid_cxcywh shape: [1, 100, 6] dtype: float32 -
Without post-process model
input_name.1: input_yuv422 shape: [1, 2, 64, 64] dtype: float32 output_name.1: txtywh_obj_quality_cls_x8 shape: [1, 56, 8, 8] dtype: float32 output_name.2: anchors shape: [8, 2] dtype: float32 output_name.3: wh_scale shape: [8, 2] dtype: float32 https://github.com/PINTO0309/UHD/blob/e0bbfe69afa0da4f83cf1f09b530a500bcd2d685/demo_uhd.py#L203-L301 score = sigmoid(obj) * (sigmoid(quality)) * sigmoid(cls) cx = (sigmoid(tx)+gx)/w cy = (sigmoid(ty)+gy)/h bw = anchor_w*softplus(tw)*wh_scale bh = anchor_h*softplus(th)*wh_scale boxes = (cx±bw/2, cy±bh/2) -
ONNX
Var Param FLOPs @0.5 Corei9
CPU
latencyONNX
sizestatic w/o post dynamic w/o post R 0.13 M 0.01 G 0.22382 0.34 ms 863 KB DL DL DL DL Y 0.29 M 0.03 G 0.29606 0.38 ms 1.2 MB DL DL DL DL Z 0.51 M 0.05 G 0.36843 0.43 ms 2.1 MB DL DL DL DL A 0.78 M 0.08 G 0.42872 0.48 ms 3.2 MB DL DL DL DL F 1.12 M 0.12 G 0.49098 0.54 ms 4.5 MB DL DL DL DL P 1.52 M 0.17 G 0.52665 0.63 ms 6.1 MB DL DL DL DL N 1.98 M 0.22 G 0.54942 0.70 ms 8.0 MB DL DL DL DL T 2.49 M 0.28 G 0.56300 0.83 ms 10.0 MB DL DL DL DL S 3.07 M 0.34 G 0.57338 0.91 ms 12.3 MB DL DL DL DL L 30.92 M 3.44 G 0.58642 7.16 ms 123.7 MB DL DL DL DL -
Input image 480x360 -> OpenCV INTER_NEAREST -> 64x64 -> YUV422 (packed: YUY2/YUYV)
100% 800% zoom -
Ydetection sample -
Fdetection sample -
Ndetection sample -
Sdetection sample -
ESPDL INT8 (.espdl, .info, .json, anchors.npy, wh_scale.npy)
I don't own an ESP32, so I haven't checked its operation.
Var ESPDL size static w/o post
s3static w/o post
p4R 222.8 KB DL DL Y 389.0 KB DL DL Z 617.4 KB DL DL A 911.6 KB DL DL F 1.2 MB DL DL P 1.6 MB DL DL N 2.1 MB DL DL T 2.6 MB DL DL S 3.2 MB DL DL
-
-
opencv_inter_nearest_y+ Optimized for Y (Luminance) only + Suitable for quantization-
ONNX
Var Param FLOPs @0.5 CPU
latencyONNX
sizestatic w/o post dynamic w/o post R 0.13 M 0.01 G 0.34 ms 863 KB Y 0.29 M 0.03 G 0.38 ms 1.2 MB Z 0.51 M 0.05 G 0.43 ms 2.1 MB A 0.78 M 0.08 G 0.48 ms 3.2 MB F 1.12 M 0.12 G 0.54 ms 4.5 MB P 1.52 M 0.17 G 0.63 ms 6.1 MB N 1.98 M 0.22 G 0.70 ms 8.0 MB T 2.49 M 0.28 G 0.83 ms 10.0 MB S 3.07 M 0.34 G 0.91 ms 12.3 MB L 30.92 M 3.44 G 0.58164 7.16 ms 123.7 MB -
ESPDL INT8 (.espdl, .info, .json, anchors.npy, wh_scale.npy)
Var ESPDL size static w/o post
s3static w/o post
p4R 222.8 KB Y 389.0 KB Z 617.4 KB A 911.6 KB F 1.2 MB P 1.6 MB N 2.1 MB T 2.6 MB S 3.2 MB
-
Caution
If you preprocess your images and resize them to 64x64 with OpenCV or similar, use Nearest mode.
Click to expand
usage: demo_uhd.py
[-h]
(--images IMAGES | --camera CAMERA)
--onnx ONNX
[--output OUTPUT]
[--img-size IMG_SIZE]
[--conf-thresh CONF_THRESH]
[--record RECORD]
[--actual-size]
[--use-nms]
[--nms-iou NMS_IOU]
UltraTinyOD ONNX demo (CPU).
options:
-h, --help
show this help message and exit
--images IMAGES
Directory with images to run batch inference.
--camera CAMERA
USB camera id for realtime inference.
--onnx ONNX
Path to ONNX model (CPU).
--output OUTPUT
Output directory for image mode.
--img-size IMG_SIZE
Input size HxW, e.g., 64x64.
--conf-thresh CONF_THRESH
Confidence threshold. Default: 0.90
--record RECORD
MP4 path for automatic recording when --camera is used.
--actual-size
Display and recording use the model input resolution instead of
the original frame size.
--use-nms
Apply Non-Maximum Suppression on decoded boxes (default IoU=0.8).
--nms-iou NMS_IOU
IoU threshold for NMS (effective only when --use-nms is set).- ONNX with post-processing
uv run demo_uhd.py \ --onnx ultratinyod_res_anc8_w192_64x64_loese_distill.onnx \ --camera 0 \ --conf-thresh 0.90 \ --use-nms \ --actual-size
- ONNX without post-processing
uv run demo_uhd.py \ --onnx ultratinyod_res_anc8_w192_64x64_loese_distill_nopost.onnx \ --camera 0 \ --conf-thresh 0.90 \ --use-nms \ --actual-size
- ONNX with pre-processing (PIL equivalent of Resize) + post-processing
uv run demo_uhd.py \ --onnx ultratinyod_res_anc8_w256_64x64_torch_bilinear_dynamic.onnx \ --camera 0 \ --conf-thresh 0.90 \ --use-nms \ --actual-size
- ONNX with pre-processing (PIL equivalent of Resize) + without post-processing
uv run demo_uhd.py \ --onnx ultratinyod_res_anc8_w256_64x64_torch_bilinear_dynamic_nopost.onnx \ --camera 0 \ --conf-thresh 0.90 \ --use-nms \ --actual-size
UltraTinyOD (anchor-only, stride 8; --cnn-width controls stem width):
use-improved-head
SIZE=64x64
ANCHOR=8
CNNWIDTH=64
LR=0.005
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-headSIZE=64x64
ANCHOR=8
CNNWIDTH=96
LR=0.004
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-headSIZE=64x64
ANCHOR=8
CNNWIDTH=128
LR=0.003
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-headSIZE=64x64
ANCHOR=8
CNNWIDTH=160
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-headSIZE=64x64
ANCHOR=8
CNNWIDTH=192
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-headSIZE=64x64
ANCHOR=8
CNNWIDTH=256
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-headuse-improved-head + utod-head-ese
SIZE=64x64
ANCHOR=8
CNNWIDTH=64
LR=0.005
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-head \
--utod-head-eseSIZE=64x64
ANCHOR=8
CNNWIDTH=96
LR=0.004
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-head \
--utod-head-eseSIZE=64x64
ANCHOR=8
CNNWIDTH=128
LR=0.003
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-head \
--utod-head-eseSIZE=64x64
ANCHOR=8
CNNWIDTH=160
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-head \
--utod-head-eseSIZE=64x64
ANCHOR=8
CNNWIDTH=192
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-head \
--utod-head-eseSIZE=64x64
ANCHOR=8
CNNWIDTH=256
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-head \
--utod-head-eseuse-improved-head + use-iou-aware-head + utod-head-ese
SIZE=64x64
ANCHOR=8
CNNWIDTH=64
LR=0.005
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_iou_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-eseSIZE=64x64
ANCHOR=8
CNNWIDTH=96
LR=0.004
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_iou_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-eseSIZE=64x64
ANCHOR=8
CNNWIDTH=128
LR=0.003
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_iou_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-eseSIZE=64x64
ANCHOR=8
CNNWIDTH=160
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_iou_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-eseSIZE=64x64
ANCHOR=8
CNNWIDTH=192
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_iou_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-eseSIZE=64x64
ANCHOR=8
CNNWIDTH=256
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_iou_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-eseuse-improved-head + use-iou-aware-head + utod-head-ese + distillation
SIZE=64x64
ANCHOR=8
CNNWIDTH=64
LR=0.0003
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR}_relu_distill \
--ckpt runs/4/ultratinyod_res_anc8_w64_se_iou_64x64_quality_lr0.005_relu/best_utod_0299_map_0.40910.pt \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--classes 0 \
--cnn-width 64 \
--auto-anchors \
--num-anchors 8 \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema --ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--teacher-ckpt runs/4/ultratinyod_res_anc8_w256_se_iou_64x64_quality_lr0.001_relu/best_utod_0300_map_0.46965.pt \
--teacher-arch ultratinyod \
--distill-kl 1.0 \
--distill-box-l1 1.0 \
--distill-feat 0.5 \
--distill-temperature 2.0 \
--distill-cosine
SIZE=64x64
ANCHOR=8
CNNWIDTH=96
LR=0.0003
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR}_relu_distill \
--ckpt runs/3/ultratinyod_res_anc8_w96_se_iou_64x64_quality_lr0.004/best_utod_0293_map_0.46502.pt \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--classes 0 \
--cnn-width 64 \
--auto-anchors \
--num-anchors 8 \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema --ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--teacher-ckpt runs/4/ultratinyod_res_anc8_w256_se_iou_64x64_quality_lr0.001_relu/best_utod_0300_map_0.46965.pt \
--teacher-arch ultratinyod \
--distill-kl 1.0 \
--distill-box-l1 1.0 \
--distill-feat 0.5 \
--distill-temperature 2.0 \
--distill-cosine
SIZE=64x64
ANCHOR=8
CNNWIDTH=128
LR=0.0003
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR}_relu_distill \
--ckpt runs/4/ultratinyod_res_anc8_w128_se_iou_64x64_quality_lr0.003_relu/best_utod_0293_map_0.45776.pt \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--classes 0 \
--cnn-width 64 \
--auto-anchors \
--num-anchors 8 \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema --ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--teacher-ckpt runs/4/ultratinyod_res_anc8_w256_se_iou_64x64_quality_lr0.001_relu/best_utod_0300_map_0.46965.pt \
--teacher-arch ultratinyod \
--distill-kl 1.0 \
--distill-box-l1 1.0 \
--distill-feat 0.5 \
--distill-temperature 2.0 \
--distill-cosine
SIZE=64x64
ANCHOR=8
CNNWIDTH=160
LR=0.0001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR}_relu_distill \
--ckpt runs/4/ultratinyod_res_anc8_w160_se_iou_64x64_quality_lr0.001_relu/best_utod_0300_map_0.45385.pt \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--classes 0 \
--cnn-width 64 \
--auto-anchors \
--num-anchors 8 \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema --ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--teacher-ckpt runs/4/ultratinyod_res_anc8_w256_se_iou_64x64_quality_lr0.001_relu/best_utod_0300_map_0.46965.pt \
--teacher-arch ultratinyod \
--distill-kl 1.0 \
--distill-box-l1 1.0 \
--distill-feat 0.5 \
--distill-temperature 2.0 \
--distill-cosine
SIZE=64x64
ANCHOR=8
CNNWIDTH=192
LR=0.0001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR}_relu_distill \
--ckpt runs/4/ultratinyod_res_anc8_w192_se_iou_64x64_quality_lr0.001_relu/best_utod_0300_map_0.47468.pt \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--classes 0 \
--cnn-width 64 \
--auto-anchors \
--num-anchors 8 \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema --ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--teacher-ckpt runs/4/ultratinyod_res_anc8_w256_se_iou_64x64_quality_lr0.001_relu/best_utod_0300_map_0.46965.pt \
--teacher-arch ultratinyod \
--distill-kl 1.0 \
--distill-box-l1 1.0 \
--distill-feat 0.5 \
--distill-temperature 2.0 \
--distill-cosineuse-improved-head + use-iou-aware-head + utod-head-ese + utod-large-obj-branch
SIZE=64x64
ANCHOR=8
CNNWIDTH=64
LR=0.005
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_loese_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--utod-large-obj-branch \
--utod-large-obj-depth 2 \
--utod-large-obj-ch-scale 1.25
SIZE=64x64
ANCHOR=8
CNNWIDTH=96
LR=0.004
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_loese_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--utod-large-obj-branch \
--utod-large-obj-depth 2 \
--utod-large-obj-ch-scale 1.25
SIZE=64x64
ANCHOR=8
CNNWIDTH=128
LR=0.003
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_loese_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--utod-large-obj-branch \
--utod-large-obj-depth 2 \
--utod-large-obj-ch-scale 1.25
SIZE=64x64
ANCHOR=8
CNNWIDTH=160
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_loese_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--utod-large-obj-branch \
--utod-large-obj-depth 2 \
--utod-large-obj-ch-scale 1.25
ANCHOR=8
CNNWIDTH=192
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_loese_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--utod-large-obj-branch \
--utod-large-obj-depth 2 \
--utod-large-obj-ch-scale 1.25
SIZE=64x64
ANCHOR=8
CNNWIDTH=256
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_loese_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--utod-large-obj-branch \
--utod-large-obj-depth 2 \
--utod-large-obj-ch-scale 1.25use-improved-head + use-iou-aware-head + utod-head-ese + utod-large-obj-branch + distillation
SIZE=64x64
ANCHOR=8
CNNWIDTH=64
LR=0.0003
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR}_relu_distill \
--ckpt runs/6/ultratinyod_res_anc8_w64_loese_64x64_quality_lr0.005/best_utod_0296_map_0.40903.pt \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--classes 0 \
--cnn-width 64 \
--auto-anchors \
--num-anchors 8 \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema --ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--teacher-ckpt runs/6/ultratinyod_res_anc8_w256_loese_64x64_quality_lr0.001/best_utod_0179_map_0.48243.pt \
--teacher-arch ultratinyod \
--distill-kl 1.0 \
--distill-box-l1 1.0 \
--distill-feat 0.5 \
--distill-temperature 2.0 \
--distill-cosine
SIZE=64x64
ANCHOR=8
CNNWIDTH=96
LR=0.0003
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR}_relu_distill \
--ckpt runs/6/ultratinyod_res_anc8_w96_loese_64x64_quality_lr0.004/best_utod_0279_map_0.46170.pt \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--classes 0 \
--cnn-width 64 \
--auto-anchors \
--num-anchors 8 \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema --ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--teacher-ckpt runs/6/ultratinyod_res_anc8_w256_loese_64x64_quality_lr0.001/best_utod_0179_map_0.48243.pt \
--teacher-arch ultratinyod \
--distill-kl 1.0 \
--distill-box-l1 1.0 \
--distill-feat 0.5 \
--distill-temperature 2.0 \
--distill-cosine
SIZE=64x64
ANCHOR=8
CNNWIDTH=128
LR=0.0003
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR}_relu_distill \
--ckpt runs/6/ultratinyod_res_anc8_w128_loese_64x64_quality_lr0.003/best_utod_0259_map_0.45860.pt \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--classes 0 \
--cnn-width 64 \
--auto-anchors \
--num-anchors 8 \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema --ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--teacher-ckpt runs/6/ultratinyod_res_anc8_w256_loese_64x64_quality_lr0.001/best_utod_0179_map_0.48243.pt \
--teacher-arch ultratinyod \
--distill-kl 1.0 \
--distill-box-l1 1.0 \
--distill-feat 0.5 \
--distill-temperature 2.0 \
--distill-cosine
SIZE=64x64
ANCHOR=8
CNNWIDTH=160
LR=0.0001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR}_relu_distill \
--ckpt runs/6/ultratinyod_res_anc8_w160_loese_64x64_quality_lr0.001/best_utod_0210_map_0.47518.pt \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--classes 0 \
--cnn-width 64 \
--auto-anchors \
--num-anchors 8 \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema --ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--teacher-ckpt runs/6/ultratinyod_res_anc8_w256_loese_64x64_quality_lr0.001/best_utod_0179_map_0.48243.pt \
--teacher-arch ultratinyod \
--distill-kl 1.0 \
--distill-box-l1 1.0 \
--distill-feat 0.5 \
--distill-temperature 2.0 \
--distill-cosine
SIZE=64x64
ANCHOR=8
CNNWIDTH=192
LR=0.0001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR}_relu_distill \
--ckpt runs/4/ultratinyod_res_anc8_w192_se_iou_64x64_quality_lr0.001_relu/best_utod_0300_map_0.47468.pt \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--classes 0 \
--cnn-width 64 \
--auto-anchors \
--num-anchors 8 \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema --ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--teacher-ckpt runs/6/ultratinyod_res_anc8_w256_loese_64x64_quality_lr0.001/best_utod_0179_map_0.48243.pt \
--teacher-arch ultratinyod \
--distill-kl 1.0 \
--distill-box-l1 1.0 \
--distill-feat 0.5 \
--distill-temperature 2.0 \
--distill-cosine
SIZE=64x64
ANCHOR=8
CNNWIDTH=256
LR=0.0001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR}_relu_distill \
--ckpt runs/4/ultratinyod_res_anc8_w256_se_iou_64x64_quality_lr0.001_relu/best_utod_0300_map_0.46965.pt \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--classes 0 \
--cnn-width 64 \
--auto-anchors \
--num-anchors 8 \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema --ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--teacher-ckpt runs/6/ultratinyod_res_anc8_w256_loese_64x64_quality_lr0.001/best_utod_0179_map_0.48243.pt \
--teacher-arch ultratinyod \
--distill-kl 1.0 \
--distill-box-l1 1.0 \
--distill-feat 0.5 \
--distill-temperature 2.0 \
--distill-cosineClick to expand
Example of running only validation on a trained checkpoint:
uv run python train.py \
--arch ultratinyod \
--img-size 64x64 \
--cnn-width 256 \
--classes 0 \
--conf-thresh 0.15 \
--ckpt runs/ultratinyod_res_anc8_w256_64x64_lr0.0003/best_utod_0297_map_0.44299.pt \
--val-only \
--use-emaClick to expand
| Parameter | Description | Default |
|---|---|---|
--arch |
Model architecture: cnn, transformer, or anchor-only ultratinyod. |
cnn |
--image-dir |
Directory containing images and YOLO txt labels. | data/wholebody34/obj_train_data |
--train-split |
Fraction of data used for training. | 0.8 |
--val-split |
Fraction of data used for validation. | 0.2 |
--img-size |
Input size HxW (e.g., 64x64). |
64x64 |
--resize-mode |
Resize mode for training preprocessing: torch_bilinear, torch_nearest, opencv_inter_linear, opencv_inter_nearest, opencv_inter_nearest_y, opencv_inter_nearest_yuv422. |
torch_bilinear |
--torch_bilinear |
Shortcut for --resize-mode torch_bilinear. |
False |
--torch_nearest |
Shortcut for --resize-mode torch_nearest. |
False |
--opencv_inter_linear |
Shortcut for --resize-mode opencv_inter_linear. |
False |
--opencv_inter_nearest |
Shortcut for --resize-mode opencv_inter_nearest. |
False |
--opencv_inter_nearest_y |
Shortcut for --resize-mode opencv_inter_nearest_y. |
False |
--opencv_inter_nearest_yuv422 |
Shortcut for --resize-mode opencv_inter_nearest_yuv422. |
False |
--exp-name |
Experiment name; logs saved under runs/<exp-name>. |
default |
--batch-size |
Batch size. | 64 |
--epochs |
Number of epochs. | 100 |
--resume |
Checkpoint to resume training (loads optimizer/scheduler). | None |
--ckpt |
Initialize weights from checkpoint (no optimizer state). | None |
--ckpt-non-strict |
Load --ckpt with strict=False (ignore missing/unexpected keys). |
False |
--val-only |
Run validation only with --ckpt or --resume weights and exit. |
False |
--val-count |
Limit number of validation images when using --val-only. |
None |
--use-improved-head |
UltraTinyOD only: enable quality-aware head (IoU-aware obj, IoU score branch, learnable WH scale, extra context). | False |
--use-iou-aware-head |
UltraTinyOD head: task-aligned IoU-aware scoring (quality*cls) with split towers. | False |
--quality-power |
Exponent for quality score when using IoU-aware head scoring. | 1.0 |
--teacher-ckpt |
Teacher checkpoint path for distillation. | None |
--teacher-arch |
Teacher architecture override. | None |
--teacher-num-queries |
Teacher DETR queries. | None |
--teacher-d-model |
Teacher model dimension. | None |
--teacher-heads |
Teacher attention heads. | None |
--teacher-layers |
Teacher encoder/decoder layers. | None |
--teacher-dim-feedforward |
Teacher FFN dimension. | None |
--teacher-use-skip |
Force teacher skip connections on. | False |
--teacher-activation |
Teacher activation (relu/swish). |
None |
--teacher-use-fpn |
Force teacher FPN on. | False |
--teacher-backbone |
Teacher backbone checkpoint for feature distillation. | None |
--teacher-backbone-arch |
Teacher backbone architecture hint. | None |
--teacher-backbone-norm |
Teacher backbone input normalization. | imagenet |
--distill-kl |
KL distillation weight (transformer). | 0.0 |
--distill-box-l1 |
Box L1 distillation weight (transformer). | 0.0 |
--distill-cosine |
Cosine ramp-up of distillation weights. | False |
--distill-temperature |
Teacher logits temperature. | 1.0 |
--distill-feat |
Feature-map distillation weight (CNN only). | 0.0 |
--lr |
Learning rate. | 0.001 |
--weight-decay |
Weight decay. | 0.0001 |
--optimizer |
Optimizer (adamw or sgd). |
adamw |
--grad-clip-norm |
Global gradient norm clip; set 0 to disable. |
5.0 |
--num-workers |
DataLoader workers. | 8 |
--device |
Device: cuda or cpu. |
cuda if available |
--seed |
Random seed. | 42 |
--log-interval |
Steps between logging to progress bar. | 10 |
--eval-interval |
Epoch interval for evaluation. | 1 |
--conf-thresh |
Confidence threshold for decoding. | 0.3 |
--topk |
Top-K for CNN decoding. | 50 |
--use-amp |
Enable automatic mixed precision. | False |
--aug-config |
YAML for augmentations (applied in listed order). | uhd/aug.yaml |
--use-ema |
Enable EMA of model weights for evaluation/checkpointing. | False |
--ema-decay |
EMA decay factor (ignored if EMA disabled). | 0.9998 |
--coco-eval |
Run COCO-style evaluation. | False |
--coco-per-class |
Log per-class COCO AP when COCO eval is enabled. | False |
--classes |
Comma-separated target class IDs. | 0 |
--activation |
Activation function (relu or swish). |
swish |
--cnn-width |
Width multiplier for CNN backbone. | 32 |
--backbone |
Optional lightweight CNN backbone (microcspnet, ultratinyresnet, enhanced-shufflenet, or none). |
None |
--backbone-channels |
Comma-separated channels for ultratinyresnet (e.g., 16,32,48,64). |
None |
--backbone-blocks |
Comma-separated residual block counts per stage for ultratinyresnet (e.g., 1,2,2,1). |
None |
--backbone-se |
Apply SE/eSE on backbone output (custom backbones only). | none |
--backbone-skip |
Add long skip fusion across custom backbone stages (ultratinyresnet). | False |
--backbone-skip-cat |
Use concat+1x1 fusion for long skips (ultratinyresnet); implies --backbone-skip. |
False |
--backbone-skip-shuffle-cat |
Use stride+shuffle concat fusion for long skips (ultratinyresnet); implies --backbone-skip. |
False |
--backbone-skip-s2d-cat |
Use space-to-depth concat fusion for long skips (ultratinyresnet); implies --backbone-skip. |
False |
--backbone-fpn |
Enable a tiny FPN fusion inside custom backbones (ultratinyresnet). | False |
--backbone-out-stride |
Override custom backbone output stride (e.g., 8 or 16). |
None |
--use-skip |
Enable skip-style fusion in the CNN head (sums pooled shallow features into the final stage). Stored in checkpoints and restored on resume. | False |
--utod-residual |
Enable residual skips inside the UltraTinyOD backbone. | False |
--utod-head-ese |
UltraTinyOD head: apply lightweight eSE on shared features. | False |
--utod-context-rfb |
UltraTinyOD head: add a receptive-field block (dilated + wide depthwise) before prediction layers. | False |
--utod-context-dilation |
Dilation used in UltraTinyOD receptive-field block (only when --utod-context-rfb). |
2 |
--utod-large-obj-branch |
UltraTinyOD head: add a downsampled large-object refinement branch (no FPN). | False |
--utod-large-obj-depth |
Number of depthwise blocks in the large-object branch (only when --utod-large-obj-branch). |
2 |
--utod-large-obj-ch-scale |
Channel scale for the large-object branch (relative to head channels). | 1.0 |
--use-anchor |
Use anchor-based head for CNN (YOLO-style). | False |
--output-stride |
Final CNN feature stride (downsample factor). Supported: 4, 8, 16. |
16 |
--anchors |
Anchor sizes as normalized w,h pairs (space separated). |
"" |
--auto-anchors |
Compute anchors from training labels when using anchor head. | False |
--num-anchors |
Number of anchors to use when auto-computing. | 3 |
--iou-loss |
IoU loss type for anchor head (iou, giou, or ciou). |
giou |
--anchor-assigner |
Anchor assigner strategy (legacy, simota). |
legacy |
--anchor-cls-loss |
Anchor classification loss (bce, vfl). |
bce |
--simota-topk |
Top-K IoUs for dynamic-k in SimOTA. | 10 |
--last-se |
Apply SE/eSE only on the last CNN block. | none |
--use-batchnorm |
Enable BatchNorm layers during training/export. | False |
--last-width-scale |
Channel scale for last CNN block (e.g., 1.25). |
1.0 |
--num-queries |
Transformer query count. | 10 |
--d-model |
Transformer model dimension. | 64 |
--heads |
Transformer attention heads. | 4 |
--layers |
Transformer encoder/decoder layers. | 3 |
--dim-feedforward |
Transformer feedforward dimension. | 128 |
--use-fpn |
Enable simple FPN for transformer backbone. | False |
Tiny CNN backbones (--backbone, optional; default keeps the original built-in CNN):
microcspnet: CSP-tiny style stem (16/32/64/128) compressed to 64ch, stride 8 output.ultratinyresnet: 16→24→32→48 channel ResNet-like stack with three downsample steps (stride 8). Channel widths and blocks per stage can be overridden via--backbone-channels/--backbone-blocks; optional long skips across stages via--backbone-skip; optional lightweight FPN fusion via--backbone-fpn.enhanced-shufflenet: Enhanced ShuffleNetV2+ inspired (arXiv:2111.00902) with progressive widening and doubled refinements, ending at ~128ch, stride 8. All custom backbones can optionally apply SE/eSE on the backbone output via--backbone-se {none,se,ese}.
Click to expand
- Specify a YAML file with
--aug-configto run thedata_augment:entries in the listed order (e.g.,--aug-config uhd/aug.yaml). - Supported ops (examples): Mosaic / MixUp / CopyPaste / HorizontalFlip (class_swap_map supported) / VerticalFlip / RandomScale / Translation / RandomCrop / RandomResizedCrop / RandomBrightness / RandomContrast / RandomSaturation / RandomHSV / RandomPhotometricDistort / Blur / MedianBlur / MotionBlur / GaussianBlur / GaussNoise / ImageCompression / ISONoise / RandomRain / RandomFog / RandomSunFlare / CLAHE / ToGray / RemoveOutliers.
- If
probis provided, it is used as the apply probability; otherwise defaults are used (most are 0, RandomPhotometricDistort defaults to 0.5). Unknown keys are ignored.
Click to expand
loss: total loss (hm + off + wh)hm: focal loss on center heatmapoff: L1 loss on center offsets (within-cell quantization correction)wh: L1 loss on width/height (feature-map scale)
Click to expand
loss: total anchor loss (box + obj + cls[+quality] when--use-improved-head)obj: BCE on objectness for each anchor location (positive vs. background)cls: BCE on per-class logits for positive anchors (one-hot over target classes)box: (1 - IoU/GIoU/CIoU) on decoded boxes for positive anchors; IoU flavor set by--iou-lossquality(improved head only): BCE on IoU-linked quality logit; obj targetもIoUでスケールされる
Click to expand
loss: total loss (cls + l1 + iou)cls: cross-entropy for class vs. backgroundl1: L1 loss on box coordinatesiou: 1 - IoU for matched predictions
Click to expand
PyTorch's Resize method is implemented using a downsampling method similar to PIL on the backend, but it is significantly different from OpenCV's downsampling implementation. Therefore, when downsampling images in preprocessing during training, it is important to note that the numerical characteristics of the images used by the model for training will be completely different depending on whether you use PyTorch's Resize method or OpenCV's Resize method. Below is the pixel-level error calculation when downsampling an image to 64x64 pixels. If the diff value is greater than 1.0, the images are completely different.
Therefore, it is easy to imagine that if the downsampling method used for preprocessing during learning is different from the downsampling method used during inference, the output inference results will be disastrous.
The internal workings of PyTorch's downsampling and PIL's downsampling are very similar but slightly different. When deploying and inferencing in Python and other environments, accuracy will be significantly degraded unless the model is deployed according to the following criteria. If you train using OpenCV's cv2.INTER_LINEAR, the model will never produce the correct output after preprocessing in PyTorch, TensorFlow, or ONNX other than OpenCV.
| Training | Deploy |
|---|---|
When training while downsampling using PyTorch's Resize (InterpolationMode.BILINEAR) |
Merge Resize Linear + half-pixel at the input of the ONNX model. This will result in the highest model accuracy. However, it will be limited to deployment on hardware, NPUs, and frameworks that support the resize operation of bilinear interpolation. It is not suitable for quantization. |
When training while downsampling using PyTorch's Resize (InterpolationMode.NEAREST) |
Merge Resize Nearest at the input of the ONNX model. It is the most versatile in terms of HW, NPU, and quantization deployment, but the accuracy of the model will be lower. |
When training while downsampling using OpenCV's Resize (cv2.INTER_NEAREST) |
Merge Resize Nearest at the input of the ONNX model or OpenCV INTER_NEAREST. Although the accuracy is low, it is highly versatile because the downsampling of images can be freely written on the program side. However, downsampling must be implemented manually. |
- Error after PIL conversion when downsampling with PyTorch's Resize InterpolationMode.BILINEAR
PyTorch(InterpolationMode.BILINEAR) -> Convert to PIL vs PyTorch Tensor(InterpolationMode.BILINEAR) max diff : 1 mean diff : 0.4949 std diff : 0.5000 - Error when downsampling with PyTorch's Resize InterpolationMode.BILINEAR compared to downsampling with OpenCV's INTER_LINER
PyTorch(InterpolationMode.BILINEAR) -> Convert to PIL vs OpenCV INTER_LINEAR max diff : 104 mean diff : 10.2930 std diff : 13.2792 - Error when downsampling with PyTorch's Resize InterpolationMode.BILINEAR compared to downsampling with OpenCV's INTER_LINER
PyTorch Tensor(InterpolationMode.BILINEAR) vs OpenCV INTER_LINEAR max diff : 104 mean diff : 10.3336 std diff : 13.2463 - Accuracy and speed of each interpolation method when downsampling in OpenCV
- Accuracy:
INTER_NEAREST < INTER_LINEAR < INTER_AREA, Speed:INTER_NEAREST > INTER_LINEAR > INTER_AREA
=== Resize benchmark === INTER_NEAREST : 0.0061 ms INTER_LINEAR : 0.0143 ms INTER_AREA : 0.3621 ms AREA / LINEAR ratio : 25.40x - Accuracy:
Click to expand
- Export a checkpoint to ONNX (auto-detects arch from checkpoint unless overridden):
SIZE=64x64 ANCHOR=8 CNNWIDTH=64 RESIZEMODE=opencv_inter_nearest_y CKPT=runs/ultratinyod_res_anc8_w64_loese_64x64_lr0.005_impaug/best_utod_0001_map_0.00000.pt uv run python export_onnx.py \ --checkpoint ${CKPT} \ --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static.onnx \ --opset 17 uv run python export_onnx.py \ --checkpoint ${CKPT} \ --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic.onnx \ --opset 17 \ --dynamic-resize uv run python export_onnx.py \ --checkpoint ${CKPT} \ --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static_nopost.onnx \ --opset 17 \ --no-merge-postprocess uv run python export_onnx.py \ --checkpoint ${CKPT} \ --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic_nopost.onnx \ --opset 17 \ --no-merge-postprocess \ --dynamic-resize SIZE=64x64 ANCHOR=8 CNNWIDTH=96 RESIZEMODE=opencv_inter_nearest_y CKPT=runs/ultratinyod_res_anc8_w96_loese_64x64_lr0.005_impaug/best_utod_0001_map_0.00000.pt uv run python export_onnx.py \ --checkpoint ${CKPT} \ --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static.onnx \ --opset 17 uv run python export_onnx.py \ --checkpoint ${CKPT} \ --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic.onnx \ --opset 17 \ --dynamic-resize uv run python export_onnx.py \ --checkpoint ${CKPT} \ --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static_nopost.onnx \ --opset 17 \ --no-merge-postprocess uv run python export_onnx.py \ --checkpoint ${CKPT} \ --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic_nopost.onnx \ --opset 17 \ --no-merge-postprocess \ --dynamic-resize SIZE=64x64 ANCHOR=8 CNNWIDTH=128 RESIZEMODE=opencv_inter_nearest_y CKPT=runs/ultratinyod_res_anc8_w128_loese_64x64_lr0.005_impaug/best_utod_0001_map_0.00000.pt uv run python export_onnx.py \ --checkpoint ${CKPT} \ --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static.onnx \ --opset 17 uv run python export_onnx.py \ --checkpoint ${CKPT} \ --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic.onnx \ --opset 17 \ --dynamic-resize uv run python export_onnx.py \ --checkpoint ${CKPT} \ --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static_nopost.onnx \ --opset 17 \ --no-merge-postprocess uv run python export_onnx.py \ --checkpoint ${CKPT} \ --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic_nopost.onnx \ --opset 17 \ --no-merge-postprocess \ --dynamic-resize SIZE=64x64 ANCHOR=8 CNNWIDTH=160 RESIZEMODE=opencv_inter_nearest_y CKPT=runs/ultratinyod_res_anc8_w160_loese_64x64_lr0.005_impaug/best_utod_0001_map_0.00000.pt uv run python export_onnx.py \ --checkpoint ${CKPT} \ --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static.onnx \ --opset 17 uv run python export_onnx.py \ --checkpoint ${CKPT} \ --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic.onnx \ --opset 17 \ --dynamic-resize uv run python export_onnx.py \ --checkpoint ${CKPT} \ --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static_nopost.onnx \ --opset 17 \ --no-merge-postprocess uv run python export_onnx.py \ --checkpoint ${CKPT} \ --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic_nopost.onnx \ --opset 17 \ --no-merge-postprocess \ --dynamic-resize SIZE=64x64 ANCHOR=8 CNNWIDTH=192 RESIZEMODE=opencv_inter_nearest_y CKPT=runs/ultratinyod_res_anc8_w192_loese_64x64_lr0.005_impaug/best_utod_0001_map_0.00000.pt uv run python export_onnx.py \ --checkpoint ${CKPT} \ --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static.onnx \ --opset 17 uv run python export_onnx.py \ --checkpoint ${CKPT} \ --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic.onnx \ --opset 17 \ --dynamic-resize uv run python export_onnx.py \ --checkpoint ${CKPT} \ --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static_nopost.onnx \ --opset 17 \ --no-merge-postprocess uv run python export_onnx.py \ --checkpoint ${CKPT} \ --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic_nopost.onnx \ --opset 17 \ --no-merge-postprocess \ --dynamic-resize SIZE=64x64 ANCHOR=8 CNNWIDTH=256 RESIZEMODE=opencv_inter_nearest_y CKPT=runs/ultratinyod_res_anc8_w256_loese_64x64_lr0.005_impaug/best_utod_0001_map_0.00000.pt uv run python export_onnx.py \ --checkpoint ${CKPT} \ --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static.onnx \ --opset 17 uv run python export_onnx.py \ --checkpoint ${CKPT} \ --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic.onnx \ --opset 17 \ --dynamic-resize uv run python export_onnx.py \ --checkpoint ${CKPT} \ --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static_nopost.onnx \ --opset 17 \ --no-merge-postprocess uv run python export_onnx.py \ --checkpoint ${CKPT} \ --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic_nopost.onnx \ --opset 17 \ --no-merge-postprocess \ --dynamic-resize
Click to expand
uv run onnx2tf \
-i ultratinyod_res_anc8_w64_64x64_quality_relu_nopost.onnx \
-cotof \
-oiqtThis repository includes a calibration/quantization script for ESP-DL:
uhd/quantize_onnx_model_for_esp32.py.
Click to expand
uv run python uhd/quantize_onnx_model_for_esp32.py \
--dataset-type image \
--image-dir data/wholebody34/obj_train_data \
--resize-mode opencv_inter_nearest_yuv422 \
--onnx-model ultratinyod_res_anc8_w16_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.onnx \
--espdl-model ultratinyod_res_anc8_w16_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.espdl \
--target "esp32s3" \
--calib-algorithm kl \
--int16-op-pattern "/model/backbone/block1/dw/conv/Conv" \
--int16-op-pattern "/model/head/context_res/context_res.2/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.0/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.1/dw/conv/Conv"
uv run python uhd/quantize_onnx_model_for_esp32.py \
--dataset-type image \
--image-dir data/wholebody34/obj_train_data \
--resize-mode opencv_inter_nearest_yuv422 \
--onnx-model ultratinyod_res_anc8_w24_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.onnx \
--espdl-model ultratinyod_res_anc8_w24_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.espdl \
--target esp32s3 \
--calib-algorithm kl \
--int16-op-pattern "/model/head/context_res/context_res.0/dw/conv/Conv" \
--int16-op-pattern "/model/head/box_tower/box_tower.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/quality_tower/quality_tower.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.0/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.1/dw/conv/Conv"
uv run python uhd/quantize_onnx_model_for_esp32.py \
--dataset-type image \
--image-dir data/wholebody34/obj_train_data \
--resize-mode opencv_inter_nearest_yuv422 \
--onnx-model ultratinyod_res_anc8_w32_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.onnx \
--espdl-model ultratinyod_res_anc8_w32_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.espdl \
--target esp32s3 \
--calib-algorithm kl \
--int16-op-pattern "/model/head/context_res/context_res.0/dw/conv/Conv" \
--int16-op-pattern "/model/head/context_res/context_res.2/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.0/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/box_tower/box_tower.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/quality_tower/quality_tower.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/obj_conv/dw/conv/Conv"
uv run python uhd/quantize_onnx_model_for_esp32.py \
--dataset-type image \
--image-dir data/wholebody34/obj_train_data \
--resize-mode opencv_inter_nearest_yuv422 \
--onnx-model ultratinyod_res_anc8_w40_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.onnx \
--espdl-model ultratinyod_res_anc8_w40_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.espdl \
--target esp32s3 \
--calib-algorithm kl \
--int16-op-pattern "/model/backbone/block1/dw/conv/Conv" \
--int16-op-pattern "/model/head/context_res/context_res.2/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_down/large_obj_down.0/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.0/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/box_tower/box_tower.0/dw/conv/Conv" \
--int16-op-pattern "/model/head/obj_conv/dw/conv/Conv" \
--int16-op-pattern "/model/head/quality_tower/quality_tower.0/dw/conv/Conv" \
--int16-op-pattern "/model/head/box_tower/box_tower.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/quality_tower/quality_tower.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/context_res/context_res.0/dw/conv/Conv"
uv run python uhd/quantize_onnx_model_for_esp32.py \
--dataset-type image \
--image-dir data/wholebody34/obj_train_data \
--resize-mode opencv_inter_nearest_yuv422 \
--onnx-model ultratinyod_res_anc8_w48_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.onnx \
--espdl-model ultratinyod_res_anc8_w48_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.espdl \
--target esp32s3 \
--calib-algorithm kl \
--int16-op-pattern "/model/backbone/block1/dw/conv/Conv" \
--int16-op-pattern "/model/head/context/dw/conv/Conv" \
--int16-op-pattern "/model/head/context_res/context_res.2/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.0/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/box_tower/box_tower.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/quality_tower/quality_tower.1/dw/conv/Conv"
uv run python uhd/quantize_onnx_model_for_esp32.py \
--dataset-type image \
--image-dir data/wholebody34/obj_train_data \
--resize-mode opencv_inter_nearest_yuv422 \
--onnx-model ultratinyod_res_anc8_w56_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.onnx \
--espdl-model ultratinyod_res_anc8_w56_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.espdl \
--target esp32s3 \
--calib-algorithm kl \
--int16-op-pattern "/model/head/context_res/context_res.0/dw/conv/Conv" \
--int16-op-pattern "/model/head/context_res/context_res.2/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/box_tower/box_tower.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/quality_tower/quality_tower.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.0/dw/conv/Conv"
uv run python uhd/quantize_onnx_model_for_esp32.py \
--dataset-type image \
--image-dir data/wholebody34/obj_train_data \
--resize-mode opencv_inter_nearest_yuv422 \
--onnx-model ultratinyod_res_anc8_w64_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.onnx \
--espdl-model ultratinyod_res_anc8_w64_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.espdl \
--target esp32s3 \
--calib-algorithm kl \
--int16-op-pattern "/model/head/context_res/context_res.2/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.0/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/box_tower/box_tower.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/quality_tower/quality_tower.1/dw/conv/Conv"
uv run python uhd/quantize_onnx_model_for_esp32.py \
--dataset-type image \
--image-dir data/wholebody34/obj_train_data \
--resize-mode opencv_inter_nearest_yuv422 \
--onnx-model ultratinyod_res_anc8_w72_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.onnx \
--espdl-model ultratinyod_res_anc8_w72_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.espdl \
--target esp32s3 \
--calib-algorithm kl \
--int16-op-pattern "/model/head/context_res/context_res.2/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/box_tower/box_tower.1/dw/conv/Conv"Notes:
- The YUV422 models expect input shape
[1, 2, 64, 64]withopencv_inter_nearest_yuv422preprocessing. --dataset-type imageis the default and ignores labels.- Adjust
--calib-steps,--batch-size,--target,--num-of-bits, and--deviceas needed.
Click to expand
--image-dir: Directory containing calibration images.--dataset-type: Calibration dataset type (imageoryolo, defaultimage).--list-path: Optional text file listing images to use.--export-anchors-wh-scale-dir: Directory to save{onnx-model}_anchors.npyand{onnx-model}_wh_scale.npy(default: same directory as--espdl-model).--expand-group-conv: Expandgroups > 1conv into group=1 (default: disabled).--img-size: Square input size used for calibration (default64).--resize-mode: Resize mode (defaultopencv_inter_nearest_yuv422).--class-ids: Comma-separated class IDs to keep (yolo only, default0).--split: Dataset split for calibration (train,val,all, defaultall).--val-split: Validation split ratio (ignored when--split all, default0.0).--batch-size: Calibration batch size (default1).--calib-steps: Number of calibration steps (default32).--calib-algorithm: Calibration algorithm (defaultkl; examples:minmax,mse,percentile).--int16-op-pattern: Regex pattern to force matched ops to int16 (repeatable).--onnx-model: Path to the input ONNX model (defaultultratinyod_res_anc8_w16_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.onnx).--espdl-model: Path to the output.espdlfile (defaultultratinyod_res_anc8_w16_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.espdl).--target: Quantize target type (c,esp32s3,esp32p4, defaultesp32s3).--num-of-bits: Quantization bits (default8).--device: Device for calibration (cpuorcuda, defaultcpu).
Click to expand
| ONNX | LiteRT(TFLite) |
|---|---|
- VSDLM: Visual-only speech detection driven by lip movements - MIT License
- OCEC: Open closed eyes classification. Ultra-fast wink and blink estimation model - MIT License
- PGC: Ultrafast pointing gesture classification - MIT License
- SC: Ultrafast sitting classification - MIT License
- PUC: Phone Usage Classifier is a three-class image classification pipeline for understanding how people interact with smartphones - MIT License
- HSC: Happy smile classifier - MIT License
- WHC: Waving Hand Classification - MIT License
- UHD: Ultra-lightweight human detection - MIT License
If you find this project useful, please consider citing:
@software{hyodo2025uhd,
author = {Katsuya Hyodo},
title = {PINTO0309/UHD},
month = {12},
year = {2025},
publisher = {Zenodo},
doi = {10.5281/zenodo.17790207},
url = {https://github.com/PINTO0309/uhd},
abstract = {Ultra-lightweight human detection. The number of parameters does not correlate to inference speed.},
}