Skip to content
/ UHD Public

Ultra-lightweight human detection. The number of parameters does not correlate to inference speed. For limited use cases, an input image resolution of 64x64 is sufficient. High-level object detection architectures such as YOLO are overkill.

License

Notifications You must be signed in to change notification settings

PINTO0309/UHD

Repository files navigation

UHD

DOI GitHub License Ask DeepWiki

Ultra-lightweight human detection. The number of parameters does not correlate to inference speed. For limited use cases, an input image resolution of 64x64 is sufficient. High-level object detection architectures such as YOLO are overkill.

Please note that the dataset used to train this model is a custom-created, ultra-high-quality dataset derived from MS-COCO. Therefore, a simple comparison with the Val mAP values ​​of other object detection models is completely meaningless. In particular, please note that the mAP values ​​of other MS-COCO-based models are unnecessarily high and do not accurately assess actual performance.

This model is an experimental implementation and is not suitable for real-time inference using a USB camera, etc.

camera_record_64x64.mp4
  • Variant-S / w ESE + IoU-aware + ReLU
Input
64x64
Output Input
64x64
Output
image 00_dist_000000075375 image 01_000000073639
image 02_000000051704 image 07_000000016314
image 28_000000044437 image 42_000000006864
image 09_dist_000000074177 image 08_dist_000000039322

Download all ONNX files at once

sudo apt update && sudo apt install -y gh
gh release download onnx -R PINTO0309/UHD

Models

  • Legacy models

    Click to expand
    • w/o ESE

      Variant Params FLOPs mAP@0.5 Corei9 CPU
      inference
      latency
      ONNX
      File size
      ONNX w/o post
      ONNX
      N 1.38 M 0.18 G 0.40343 0.93 ms 5.6 MB Download Download
      T 3.10 M 0.41 G 0.44529 1.50 ms 12.3 MB Download Download
      S 5.43 M 0.71 G 0.44945 2.23 ms 21.8 MB Download Download
      C 8.46 M 1.11 G 0.45005 2.66 ms 33.9 MB Download Download
      M 12.15 M 1.60 G 0.44875 4.07 ms 48.7 MB Download Download
      L 21.54 M 2.83 G 0.44686 6.23 ms 86.2 MB Download Download
    • w ESE

      Variant Params FLOPs mAP@0.5 Corei9 CPU
      inference
      latency
      ONNX
      File size
      ONNX w/o post
      ONNX
      N 1.45 M 0.18 G 0.41018 1.05 ms 5.8 MB Download Download
      T 3.22 M 0.41 G 0.44130 1.27 ms 12.9 MB Download Download
      S 5.69 M 0.71 G 0.46612 2.10 ms 22.8 MB Download Download
      C 8.87 M 1.11 G 0.45095 2.86 ms 35.5 MB Download Download
      M 12.74 M 1.60 G 0.46502 3.95 ms 51.0 MB Download Download
      L 22.59 M 2.83 G 0.45787 6.52 ms 90.4 MB Download Download
    • ESE + IoU-aware + Swish

      Variant Params FLOPs mAP@0.5 Corei9 CPU
      inference
      latency
      ONNX
      File size
      ONNX w/o post
      ONNX
      N 1.60 M 0.20 G 0.42806 1.25 ms 6.5 MB Download Download
      T 3.56 M 0.45 G 0.46502 1.82 ms 14.3 MB Download Download
      S 6.30 M 0.79 G 0.47473 2.78 ms 25.2 MB Download Download
      C 9.81 M 1.23 G 0.46235 3.58 ms 39.3 MB Download Download
      M 14.09 M 1.77 G 0.46562 5.05 ms 56.4 MB Download Download
      L 24.98 M 3.13 G 0.47774 7.46 ms 100 MB Download Download
    • ESE + IoU-aware + ReLU

      Variant Params FLOPs mAP@0.5 Corei9 CPU
      inference
      latency
      ONNX
      File size
      ONNX w/o post
      ONNX
      N 1.60 M 0.20 G 0.40910 0.63 ms 6.4 MB Download Download
      T 3.56 M 0.45 G 0.44618 1.08 ms 14.3 MB Download Download
      S 6.30 M 0.79 G 0.45776 1.71 ms 25.2 MB Download Download
      C 9.81 M 1.23 G 0.45385 2.51 ms 39.3 MB Download Download
      M 14.09 M 1.77 G 0.47468 3.54 ms 56.4 MB Download Download
      L 24.98 M 3.13 G 0.46965 6.14 ms 100 MB Download Download
    • ESE + IoU-aware + large-object-branch + ReLU

      Variant Params FLOPs mAP@0.5 Corei9 CPU
      inference
      latency
      ONNX
      File size
      ONNX w/o post
      ONNX
      N 1.98 M 0.22 G 0.40903 0.77 ms 8.0 MB Download Download
      T 4.40 M 0.49 G 0.46170 1.40 ms 17.7 MB Download Download
      S 7.79 M 0.87 G 0.45860 2.30 ms 31.2 MB Download Download
      C 12.13 M 1.35 G 0.47518 2.83 ms 48.6 MB Download Download
      M 17.44 M 1.94 G 0.45816 4.37 ms 69.8 MB Download Download
      L 30.92 M 3.44 G 0.48243 7.40 ms 123.7 MB Download Download
    • [For long distances and extremely small objects] ESE + IoU-aware + ReLU + Distillation

      Variant Params FLOPs mAP@0.5 Corei9 CPU
      inference
      latency
      ONNX
      File size
      ONNX w/o post
      ONNX
      N 1.60 M 0.20 G 0.55224 0.63 ms 6.4 MB Download Download
      T 3.56 M 0.45 G 0.56040 1.08 ms 14.3 MB Download Download
      S 6.30 M 0.79 G 0.57361 1.71 ms 25.2 MB Download Download
      C 9.81 M 1.23 G 0.56183 2.51 ms 39.3 MB Download Download
      M 14.09 M 1.77 G 0.57666 3.54 ms 56.4 MB Download Download
    • [For short/medium distance] ESE + IoU-aware + large-object-branch + ReLU + Distillation

      Variant Params FLOPs mAP@0.5 Corei9 CPU
      inference
      latency
      ONNX
      File size
      ONNX w/o post
      ONNX
      N 1.98 M 0.22 G 0.54883 0.70 ms 8.0 MB Download Download
      T 4.40 M 0.49 G 0.55663 1.18 ms 17.7 MB Download Download
      S 7.79 M 0.87 G 0.57397 1.97 ms 31.2 MB Download Download
      C 12.13 M 1.35 G 0.56768 2.74 ms 48.6 MB Download Download
      M 17.44 M 1.94 G 0.57815 3.57 ms 69.8 MB Download Download
    • torch_bilinear_dynamic + No resizing required + Not suitable for quantization

      Variant Params FLOPs mAP@0.5 Corei9 CPU
      inference
      latency
      ONNX
      File size
      ONNX w/o post
      ONNX
      N 1.98 M 0.22 G 0.55489 0.70 ms 8.0 MB Download Download
      T 4.40 M 0.49 G 0.57824 1.18 ms 17.7 MB Download Download
      S 7.79 M 0.87 G 0.58478 1.97 ms 31.2 MB Download Download
      C 12.13 M 1.35 G 0.58459 2.74 ms 48.6 MB Download Download
      M 17.44 M 1.94 G 0.59034 3.57 ms 69.8 MB Download Download
      L 30.92 M 3.44 G 0.58929 7.16 ms 123.7 MB Download Download
    • torch_nearest_dynamic + No resizing required + Suitable for quantization

      Variant Params FLOPs mAP@0.5 Corei9 CPU
      inference
      latency
      ONNX
      File size
      ONNX w/o post
      ONNX
      N 1.98 M 0.22 G 0.53376 0.70 ms 8.0 MB Download Download
      T 4.40 M 0.49 G 0.55561 1.18 ms 17.7 MB Download Download
      S 7.79 M 0.87 G 0.56396 1.97 ms 31.2 MB Download Download
      C 12.13 M 1.35 G 0.56328 2.74 ms 48.6 MB Download Download
      M 17.44 M 1.94 G 0.57075 3.57 ms 69.8 MB Download Download
      L 30.92 M 3.44 G 0.56787 7.16 ms 123.7 MB Download Download
  • opencv_inter_nearest + Optimized for OpenCV RGB downsampling + Suitable for quantization

    Var Param FLOPs @0.5 Corei9
    CPU
    latency
    ONNX
    size
    static w/o post dynamic w/o post
    R 0.13 M 0.01 G 0.21230 0.24 ms 863 KB DL DL DL DL
    Y 0.29 M 0.03 G 0.28664 0.28 ms 1.2 MB DL DL DL DL
    Z 0.51 M 0.05 G 0.32722 0.32 ms 2.1 MB DL DL DL DL
    A 0.78 M 0.08 G 0.43661 0.37 ms 3.2 MB DL DL DL DL
    F 1.12 M 0.12 G 0.47942 0.44 ms 4.5 MB DL DL DL DL
    P 1.52 M 0.17 G 0.51094 0.50 ms 6.1 MB DL DL DL DL
    N 1.98 M 0.22 G 0.55003 0.60 ms 8.0 MB DL DL DL DL
    T 2.49 M 0.28 G 0.56550 0.70 ms 10.0 MB DL DL DL DL
    S 3.07 M 0.34 G 0.57015 0.81 ms 12.3 MB DL DL DL DL
    L 30.92 M 3.44 G 0.58399 7.16 ms 123.7 MB DL DL DL DL
  • opencv_inter_nearest_yuv422 + Optimized for YUV422 + Suitable for quantization

    • Variants

      R: ront, Y: yocto, Z: zepto, A: atto
      F: femto, P: pico, N: nano, T: tiny
      S: small, C: compact, M: medium, L: large
      
    • YUV422

      img_u8 = np.ones([64,64,3], dtype=np.uint8)
      yuyv = cv2.cvtColor(img_u8, cv2.COLOR_RGB2YUV_YUYV)
      print(yuyv.shape)
      (64, 64, 2)
    • With post-process model

      input_name.1: input_yuv422 shape: [1, 2, 64, 64] dtype: float32
      
      output_name.1: score_classid_cxcywh shape: [1, 100, 6] dtype: float32
      
    • Without post-process model

      input_name.1: input_yuv422 shape: [1, 2, 64, 64] dtype: float32
      
      output_name.1: txtywh_obj_quality_cls_x8 shape: [1, 56, 8, 8] dtype: float32
      output_name.2: anchors shape: [8, 2] dtype: float32
      output_name.3: wh_scale shape: [8, 2] dtype: float32
      
      https://github.com/PINTO0309/UHD/blob/e0bbfe69afa0da4f83cf1f09b530a500bcd2d685/demo_uhd.py#L203-L301
      
      score = sigmoid(obj) * (sigmoid(quality)) * sigmoid(cls)
      cx = (sigmoid(tx)+gx)/w
      cy = (sigmoid(ty)+gy)/h
      bw = anchor_w*softplus(tw)*wh_scale
      bh = anchor_h*softplus(th)*wh_scale
      boxes = (cx±bw/2, cy±bh/2)
      
    • ONNX

      Var Param FLOPs @0.5 Corei9
      CPU
      latency
      ONNX
      size
      static w/o post dynamic w/o post
      R 0.13 M 0.01 G 0.22382 0.34 ms 863 KB DL DL DL DL
      Y 0.29 M 0.03 G 0.29606 0.38 ms 1.2 MB DL DL DL DL
      Z 0.51 M 0.05 G 0.36843 0.43 ms 2.1 MB DL DL DL DL
      A 0.78 M 0.08 G 0.42872 0.48 ms 3.2 MB DL DL DL DL
      F 1.12 M 0.12 G 0.49098 0.54 ms 4.5 MB DL DL DL DL
      P 1.52 M 0.17 G 0.52665 0.63 ms 6.1 MB DL DL DL DL
      N 1.98 M 0.22 G 0.54942 0.70 ms 8.0 MB DL DL DL DL
      T 2.49 M 0.28 G 0.56300 0.83 ms 10.0 MB DL DL DL DL
      S 3.07 M 0.34 G 0.57338 0.91 ms 12.3 MB DL DL DL DL
      L 30.92 M 3.44 G 0.58642 7.16 ms 123.7 MB DL DL DL DL
    • Input image 480x360 -> OpenCV INTER_NEAREST -> 64x64 -> YUV422 (packed: YUY2/YUYV)

      100% 800% zoom
      resized_64x64_nearest image
    • Y detection sample

      00_000000019456
    • F detection sample

      00_000000019456
    • N detection sample

      00_000000019456
    • S detection sample

      00_000000019456
    • ESPDL INT8 (.espdl, .info, .json, anchors.npy, wh_scale.npy)

      I don't own an ESP32, so I haven't checked its operation.

      Var ESPDL size static w/o post
      s3
      static w/o post
      p4
      R 222.8 KB DL DL
      Y 389.0 KB DL DL
      Z 617.4 KB DL DL
      A 911.6 KB DL DL
      F 1.2 MB DL DL
      P 1.6 MB DL DL
      N 2.1 MB DL DL
      T 2.6 MB DL DL
      S 3.2 MB DL DL
  • opencv_inter_nearest_y + Optimized for Y (Luminance) only + Suitable for quantization

    • ONNX

      Var Param FLOPs @0.5 CPU
      latency
      ONNX
      size
      static w/o post dynamic w/o post
      R 0.13 M 0.01 G 0.34 ms 863 KB
      Y 0.29 M 0.03 G 0.38 ms 1.2 MB
      Z 0.51 M 0.05 G 0.43 ms 2.1 MB
      A 0.78 M 0.08 G 0.48 ms 3.2 MB
      F 1.12 M 0.12 G 0.54 ms 4.5 MB
      P 1.52 M 0.17 G 0.63 ms 6.1 MB
      N 1.98 M 0.22 G 0.70 ms 8.0 MB
      T 2.49 M 0.28 G 0.83 ms 10.0 MB
      S 3.07 M 0.34 G 0.91 ms 12.3 MB
      L 30.92 M 3.44 G 0.58164 7.16 ms 123.7 MB
    • ESPDL INT8 (.espdl, .info, .json, anchors.npy, wh_scale.npy)

      Var ESPDL size static w/o post
      s3
      static w/o post
      p4
      R 222.8 KB
      Y 389.0 KB
      Z 617.4 KB
      A 911.6 KB
      F 1.2 MB
      P 1.6 MB
      N 2.1 MB
      T 2.6 MB
      S 3.2 MB

Inference

Caution

If you preprocess your images and resize them to 64x64 with OpenCV or similar, use Nearest mode.

Click to expand
usage: demo_uhd.py
[-h]
(--images IMAGES | --camera CAMERA)
--onnx ONNX
[--output OUTPUT]
[--img-size IMG_SIZE]
[--conf-thresh CONF_THRESH]
[--record RECORD]
[--actual-size]
[--use-nms]
[--nms-iou NMS_IOU]

UltraTinyOD ONNX demo (CPU).

options:
  -h, --help
   show this help message and exit
  --images IMAGES
   Directory with images to run batch inference.
  --camera CAMERA
   USB camera id for realtime inference.
  --onnx ONNX
   Path to ONNX model (CPU).
  --output OUTPUT
   Output directory for image mode.
  --img-size IMG_SIZE
   Input size HxW, e.g., 64x64.
  --conf-thresh CONF_THRESH
   Confidence threshold. Default: 0.90
  --record RECORD
   MP4 path for automatic recording when --camera is used.
  --actual-size
   Display and recording use the model input resolution instead of
   the original frame size.
  --use-nms
   Apply Non-Maximum Suppression on decoded boxes (default IoU=0.8).
  --nms-iou NMS_IOU
   IoU threshold for NMS (effective only when --use-nms is set).
  • ONNX with post-processing
    uv run demo_uhd.py \
    --onnx ultratinyod_res_anc8_w192_64x64_loese_distill.onnx \
    --camera 0 \
    --conf-thresh 0.90 \
    --use-nms \
    --actual-size
  • ONNX without post-processing
    uv run demo_uhd.py \
    --onnx ultratinyod_res_anc8_w192_64x64_loese_distill_nopost.onnx \
    --camera 0 \
    --conf-thresh 0.90 \
    --use-nms \
    --actual-size
  • ONNX with pre-processing (PIL equivalent of Resize) + post-processing
    uv run demo_uhd.py \
    --onnx ultratinyod_res_anc8_w256_64x64_torch_bilinear_dynamic.onnx \
    --camera 0 \
    --conf-thresh 0.90 \
    --use-nms \
    --actual-size
  • ONNX with pre-processing (PIL equivalent of Resize) + without post-processing
    uv run demo_uhd.py \
    --onnx ultratinyod_res_anc8_w256_64x64_torch_bilinear_dynamic_nopost.onnx \
    --camera 0 \
    --conf-thresh 0.90 \
    --use-nms \
    --actual-size

Training Examples (full CLI)

UltraTinyOD (anchor-only, stride 8; --cnn-width controls stem width):

use-improved-head
SIZE=64x64
ANCHOR=8
CNNWIDTH=64
LR=0.005
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-head
SIZE=64x64
ANCHOR=8
CNNWIDTH=96
LR=0.004
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-head
SIZE=64x64
ANCHOR=8
CNNWIDTH=128
LR=0.003
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-head
SIZE=64x64
ANCHOR=8
CNNWIDTH=160
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-head
SIZE=64x64
ANCHOR=8
CNNWIDTH=192
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-head
SIZE=64x64
ANCHOR=8
CNNWIDTH=256
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-head
use-improved-head + utod-head-ese
SIZE=64x64
ANCHOR=8
CNNWIDTH=64
LR=0.005
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-head \
--utod-head-ese
SIZE=64x64
ANCHOR=8
CNNWIDTH=96
LR=0.004
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-head \
--utod-head-ese
SIZE=64x64
ANCHOR=8
CNNWIDTH=128
LR=0.003
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-head \
--utod-head-ese
SIZE=64x64
ANCHOR=8
CNNWIDTH=160
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-head \
--utod-head-ese
SIZE=64x64
ANCHOR=8
CNNWIDTH=192
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-head \
--utod-head-ese
SIZE=64x64
ANCHOR=8
CNNWIDTH=256
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--utod-residual \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--use-improved-head \
--utod-head-ese
use-improved-head + use-iou-aware-head + utod-head-ese
SIZE=64x64
ANCHOR=8
CNNWIDTH=64
LR=0.005
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_iou_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese
SIZE=64x64
ANCHOR=8
CNNWIDTH=96
LR=0.004
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_iou_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese
SIZE=64x64
ANCHOR=8
CNNWIDTH=128
LR=0.003
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_iou_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese
SIZE=64x64
ANCHOR=8
CNNWIDTH=160
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_iou_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese
SIZE=64x64
ANCHOR=8
CNNWIDTH=192
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_iou_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese
SIZE=64x64
ANCHOR=8
CNNWIDTH=256
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_se_iou_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese
use-improved-head + use-iou-aware-head + utod-head-ese + distillation
SIZE=64x64
ANCHOR=8
CNNWIDTH=64
LR=0.0003
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR}_relu_distill \
--ckpt runs/4/ultratinyod_res_anc8_w64_se_iou_64x64_quality_lr0.005_relu/best_utod_0299_map_0.40910.pt \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--classes 0 \
--cnn-width 64 \
--auto-anchors \
--num-anchors 8 \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema --ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--teacher-ckpt runs/4/ultratinyod_res_anc8_w256_se_iou_64x64_quality_lr0.001_relu/best_utod_0300_map_0.46965.pt \
--teacher-arch ultratinyod \
--distill-kl 1.0 \
--distill-box-l1 1.0 \
--distill-feat 0.5 \
--distill-temperature 2.0 \
--distill-cosine


SIZE=64x64
ANCHOR=8
CNNWIDTH=96
LR=0.0003
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR}_relu_distill \
--ckpt runs/3/ultratinyod_res_anc8_w96_se_iou_64x64_quality_lr0.004/best_utod_0293_map_0.46502.pt \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--classes 0 \
--cnn-width 64 \
--auto-anchors \
--num-anchors 8 \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema --ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--teacher-ckpt runs/4/ultratinyod_res_anc8_w256_se_iou_64x64_quality_lr0.001_relu/best_utod_0300_map_0.46965.pt \
--teacher-arch ultratinyod \
--distill-kl 1.0 \
--distill-box-l1 1.0 \
--distill-feat 0.5 \
--distill-temperature 2.0 \
--distill-cosine

SIZE=64x64
ANCHOR=8
CNNWIDTH=128
LR=0.0003
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR}_relu_distill \
--ckpt runs/4/ultratinyod_res_anc8_w128_se_iou_64x64_quality_lr0.003_relu/best_utod_0293_map_0.45776.pt \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--classes 0 \
--cnn-width 64 \
--auto-anchors \
--num-anchors 8 \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema --ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--teacher-ckpt runs/4/ultratinyod_res_anc8_w256_se_iou_64x64_quality_lr0.001_relu/best_utod_0300_map_0.46965.pt \
--teacher-arch ultratinyod \
--distill-kl 1.0 \
--distill-box-l1 1.0 \
--distill-feat 0.5 \
--distill-temperature 2.0 \
--distill-cosine

SIZE=64x64
ANCHOR=8
CNNWIDTH=160
LR=0.0001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR}_relu_distill \
--ckpt runs/4/ultratinyod_res_anc8_w160_se_iou_64x64_quality_lr0.001_relu/best_utod_0300_map_0.45385.pt \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--classes 0 \
--cnn-width 64 \
--auto-anchors \
--num-anchors 8 \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema --ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--teacher-ckpt runs/4/ultratinyod_res_anc8_w256_se_iou_64x64_quality_lr0.001_relu/best_utod_0300_map_0.46965.pt \
--teacher-arch ultratinyod \
--distill-kl 1.0 \
--distill-box-l1 1.0 \
--distill-feat 0.5 \
--distill-temperature 2.0 \
--distill-cosine

SIZE=64x64
ANCHOR=8
CNNWIDTH=192
LR=0.0001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR}_relu_distill \
--ckpt runs/4/ultratinyod_res_anc8_w192_se_iou_64x64_quality_lr0.001_relu/best_utod_0300_map_0.47468.pt \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--classes 0 \
--cnn-width 64 \
--auto-anchors \
--num-anchors 8 \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema --ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--teacher-ckpt runs/4/ultratinyod_res_anc8_w256_se_iou_64x64_quality_lr0.001_relu/best_utod_0300_map_0.46965.pt \
--teacher-arch ultratinyod \
--distill-kl 1.0 \
--distill-box-l1 1.0 \
--distill-feat 0.5 \
--distill-temperature 2.0 \
--distill-cosine
use-improved-head + use-iou-aware-head + utod-head-ese + utod-large-obj-branch
SIZE=64x64
ANCHOR=8
CNNWIDTH=64
LR=0.005
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_loese_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--utod-large-obj-branch \
--utod-large-obj-depth 2 \
--utod-large-obj-ch-scale 1.25

SIZE=64x64
ANCHOR=8
CNNWIDTH=96
LR=0.004
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_loese_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--utod-large-obj-branch \
--utod-large-obj-depth 2 \
--utod-large-obj-ch-scale 1.25

SIZE=64x64
ANCHOR=8
CNNWIDTH=128
LR=0.003
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_loese_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--utod-large-obj-branch \
--utod-large-obj-depth 2 \
--utod-large-obj-ch-scale 1.25

SIZE=64x64
ANCHOR=8
CNNWIDTH=160
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_loese_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--utod-large-obj-branch \
--utod-large-obj-depth 2 \
--utod-large-obj-ch-scale 1.25

ANCHOR=8
CNNWIDTH=192
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_loese_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--utod-large-obj-branch \
--utod-large-obj-depth 2 \
--utod-large-obj-ch-scale 1.25

SIZE=64x64
ANCHOR=8
CNNWIDTH=256
LR=0.001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_loese_${SIZE}_${IMPHEAD}_lr${LR} \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--use-amp \
--classes 0 \
--cnn-width ${CNNWIDTH} \
--auto-anchors \
--num-anchors ${ANCHOR} \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema \
--ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--utod-large-obj-branch \
--utod-large-obj-depth 2 \
--utod-large-obj-ch-scale 1.25
use-improved-head + use-iou-aware-head + utod-head-ese + utod-large-obj-branch + distillation
SIZE=64x64
ANCHOR=8
CNNWIDTH=64
LR=0.0003
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR}_relu_distill \
--ckpt runs/6/ultratinyod_res_anc8_w64_loese_64x64_quality_lr0.005/best_utod_0296_map_0.40903.pt \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--classes 0 \
--cnn-width 64 \
--auto-anchors \
--num-anchors 8 \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema --ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--teacher-ckpt runs/6/ultratinyod_res_anc8_w256_loese_64x64_quality_lr0.001/best_utod_0179_map_0.48243.pt \
--teacher-arch ultratinyod \
--distill-kl 1.0 \
--distill-box-l1 1.0 \
--distill-feat 0.5 \
--distill-temperature 2.0 \
--distill-cosine

SIZE=64x64
ANCHOR=8
CNNWIDTH=96
LR=0.0003
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR}_relu_distill \
--ckpt runs/6/ultratinyod_res_anc8_w96_loese_64x64_quality_lr0.004/best_utod_0279_map_0.46170.pt \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--classes 0 \
--cnn-width 64 \
--auto-anchors \
--num-anchors 8 \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema --ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--teacher-ckpt runs/6/ultratinyod_res_anc8_w256_loese_64x64_quality_lr0.001/best_utod_0179_map_0.48243.pt \
--teacher-arch ultratinyod \
--distill-kl 1.0 \
--distill-box-l1 1.0 \
--distill-feat 0.5 \
--distill-temperature 2.0 \
--distill-cosine

SIZE=64x64
ANCHOR=8
CNNWIDTH=128
LR=0.0003
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR}_relu_distill \
--ckpt runs/6/ultratinyod_res_anc8_w128_loese_64x64_quality_lr0.003/best_utod_0259_map_0.45860.pt \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--classes 0 \
--cnn-width 64 \
--auto-anchors \
--num-anchors 8 \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema --ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--teacher-ckpt runs/6/ultratinyod_res_anc8_w256_loese_64x64_quality_lr0.001/best_utod_0179_map_0.48243.pt \
--teacher-arch ultratinyod \
--distill-kl 1.0 \
--distill-box-l1 1.0 \
--distill-feat 0.5 \
--distill-temperature 2.0 \
--distill-cosine

SIZE=64x64
ANCHOR=8
CNNWIDTH=160
LR=0.0001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR}_relu_distill \
--ckpt runs/6/ultratinyod_res_anc8_w160_loese_64x64_quality_lr0.001/best_utod_0210_map_0.47518.pt \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--classes 0 \
--cnn-width 64 \
--auto-anchors \
--num-anchors 8 \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema --ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--teacher-ckpt runs/6/ultratinyod_res_anc8_w256_loese_64x64_quality_lr0.001/best_utod_0179_map_0.48243.pt \
--teacher-arch ultratinyod \
--distill-kl 1.0 \
--distill-box-l1 1.0 \
--distill-feat 0.5 \
--distill-temperature 2.0 \
--distill-cosine

SIZE=64x64
ANCHOR=8
CNNWIDTH=192
LR=0.0001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR}_relu_distill \
--ckpt runs/4/ultratinyod_res_anc8_w192_se_iou_64x64_quality_lr0.001_relu/best_utod_0300_map_0.47468.pt \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--classes 0 \
--cnn-width 64 \
--auto-anchors \
--num-anchors 8 \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema --ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--teacher-ckpt runs/6/ultratinyod_res_anc8_w256_loese_64x64_quality_lr0.001/best_utod_0179_map_0.48243.pt \
--teacher-arch ultratinyod \
--distill-kl 1.0 \
--distill-box-l1 1.0 \
--distill-feat 0.5 \
--distill-temperature 2.0 \
--distill-cosine

SIZE=64x64
ANCHOR=8
CNNWIDTH=256
LR=0.0001
IMPHEAD=quality
uv run python train.py \
--arch ultratinyod \
--image-dir data/wholebody34/obj_train_data \
--img-size ${SIZE} \
--exp-name ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${IMPHEAD}_lr${LR}_relu_distill \
--ckpt runs/4/ultratinyod_res_anc8_w256_se_iou_64x64_quality_lr0.001_relu/best_utod_0300_map_0.46965.pt \
--batch-size 64 \
--epochs 300 \
--lr ${LR} \
--weight-decay 0.0001 \
--num-workers 12 \
--device cuda \
--classes 0 \
--cnn-width 64 \
--auto-anchors \
--num-anchors 8 \
--iou-loss ciou \
--conf-thresh 0.15 \
--use-ema --ema-decay 0.9999 \
--grad-clip-norm 10.0 \
--use-batchnorm \
--utod-residual \
--use-improved-head \
--use-iou-aware-head \
--utod-head-ese \
--activation relu \
--teacher-ckpt runs/6/ultratinyod_res_anc8_w256_loese_64x64_quality_lr0.001/best_utod_0179_map_0.48243.pt \
--teacher-arch ultratinyod \
--distill-kl 1.0 \
--distill-box-l1 1.0 \
--distill-feat 0.5 \
--distill-temperature 2.0 \
--distill-cosine

Validation-only Example

Click to expand

Example of running only validation on a trained checkpoint:

uv run python train.py \
--arch ultratinyod \
--img-size 64x64 \
--cnn-width 256 \
--classes 0 \
--conf-thresh 0.15 \
--ckpt runs/ultratinyod_res_anc8_w256_64x64_lr0.0003/best_utod_0297_map_0.44299.pt \
--val-only \
--use-ema

CLI parameters

Click to expand
Parameter Description Default
--arch Model architecture: cnn, transformer, or anchor-only ultratinyod. cnn
--image-dir Directory containing images and YOLO txt labels. data/wholebody34/obj_train_data
--train-split Fraction of data used for training. 0.8
--val-split Fraction of data used for validation. 0.2
--img-size Input size HxW (e.g., 64x64). 64x64
--resize-mode Resize mode for training preprocessing: torch_bilinear, torch_nearest, opencv_inter_linear, opencv_inter_nearest, opencv_inter_nearest_y, opencv_inter_nearest_yuv422. torch_bilinear
--torch_bilinear Shortcut for --resize-mode torch_bilinear. False
--torch_nearest Shortcut for --resize-mode torch_nearest. False
--opencv_inter_linear Shortcut for --resize-mode opencv_inter_linear. False
--opencv_inter_nearest Shortcut for --resize-mode opencv_inter_nearest. False
--opencv_inter_nearest_y Shortcut for --resize-mode opencv_inter_nearest_y. False
--opencv_inter_nearest_yuv422 Shortcut for --resize-mode opencv_inter_nearest_yuv422. False
--exp-name Experiment name; logs saved under runs/<exp-name>. default
--batch-size Batch size. 64
--epochs Number of epochs. 100
--resume Checkpoint to resume training (loads optimizer/scheduler). None
--ckpt Initialize weights from checkpoint (no optimizer state). None
--ckpt-non-strict Load --ckpt with strict=False (ignore missing/unexpected keys). False
--val-only Run validation only with --ckpt or --resume weights and exit. False
--val-count Limit number of validation images when using --val-only. None
--use-improved-head UltraTinyOD only: enable quality-aware head (IoU-aware obj, IoU score branch, learnable WH scale, extra context). False
--use-iou-aware-head UltraTinyOD head: task-aligned IoU-aware scoring (quality*cls) with split towers. False
--quality-power Exponent for quality score when using IoU-aware head scoring. 1.0
--teacher-ckpt Teacher checkpoint path for distillation. None
--teacher-arch Teacher architecture override. None
--teacher-num-queries Teacher DETR queries. None
--teacher-d-model Teacher model dimension. None
--teacher-heads Teacher attention heads. None
--teacher-layers Teacher encoder/decoder layers. None
--teacher-dim-feedforward Teacher FFN dimension. None
--teacher-use-skip Force teacher skip connections on. False
--teacher-activation Teacher activation (relu/swish). None
--teacher-use-fpn Force teacher FPN on. False
--teacher-backbone Teacher backbone checkpoint for feature distillation. None
--teacher-backbone-arch Teacher backbone architecture hint. None
--teacher-backbone-norm Teacher backbone input normalization. imagenet
--distill-kl KL distillation weight (transformer). 0.0
--distill-box-l1 Box L1 distillation weight (transformer). 0.0
--distill-cosine Cosine ramp-up of distillation weights. False
--distill-temperature Teacher logits temperature. 1.0
--distill-feat Feature-map distillation weight (CNN only). 0.0
--lr Learning rate. 0.001
--weight-decay Weight decay. 0.0001
--optimizer Optimizer (adamw or sgd). adamw
--grad-clip-norm Global gradient norm clip; set 0 to disable. 5.0
--num-workers DataLoader workers. 8
--device Device: cuda or cpu. cuda if available
--seed Random seed. 42
--log-interval Steps between logging to progress bar. 10
--eval-interval Epoch interval for evaluation. 1
--conf-thresh Confidence threshold for decoding. 0.3
--topk Top-K for CNN decoding. 50
--use-amp Enable automatic mixed precision. False
--aug-config YAML for augmentations (applied in listed order). uhd/aug.yaml
--use-ema Enable EMA of model weights for evaluation/checkpointing. False
--ema-decay EMA decay factor (ignored if EMA disabled). 0.9998
--coco-eval Run COCO-style evaluation. False
--coco-per-class Log per-class COCO AP when COCO eval is enabled. False
--classes Comma-separated target class IDs. 0
--activation Activation function (relu or swish). swish
--cnn-width Width multiplier for CNN backbone. 32
--backbone Optional lightweight CNN backbone (microcspnet, ultratinyresnet, enhanced-shufflenet, or none). None
--backbone-channels Comma-separated channels for ultratinyresnet (e.g., 16,32,48,64). None
--backbone-blocks Comma-separated residual block counts per stage for ultratinyresnet (e.g., 1,2,2,1). None
--backbone-se Apply SE/eSE on backbone output (custom backbones only). none
--backbone-skip Add long skip fusion across custom backbone stages (ultratinyresnet). False
--backbone-skip-cat Use concat+1x1 fusion for long skips (ultratinyresnet); implies --backbone-skip. False
--backbone-skip-shuffle-cat Use stride+shuffle concat fusion for long skips (ultratinyresnet); implies --backbone-skip. False
--backbone-skip-s2d-cat Use space-to-depth concat fusion for long skips (ultratinyresnet); implies --backbone-skip. False
--backbone-fpn Enable a tiny FPN fusion inside custom backbones (ultratinyresnet). False
--backbone-out-stride Override custom backbone output stride (e.g., 8 or 16). None
--use-skip Enable skip-style fusion in the CNN head (sums pooled shallow features into the final stage). Stored in checkpoints and restored on resume. False
--utod-residual Enable residual skips inside the UltraTinyOD backbone. False
--utod-head-ese UltraTinyOD head: apply lightweight eSE on shared features. False
--utod-context-rfb UltraTinyOD head: add a receptive-field block (dilated + wide depthwise) before prediction layers. False
--utod-context-dilation Dilation used in UltraTinyOD receptive-field block (only when --utod-context-rfb). 2
--utod-large-obj-branch UltraTinyOD head: add a downsampled large-object refinement branch (no FPN). False
--utod-large-obj-depth Number of depthwise blocks in the large-object branch (only when --utod-large-obj-branch). 2
--utod-large-obj-ch-scale Channel scale for the large-object branch (relative to head channels). 1.0
--use-anchor Use anchor-based head for CNN (YOLO-style). False
--output-stride Final CNN feature stride (downsample factor). Supported: 4, 8, 16. 16
--anchors Anchor sizes as normalized w,h pairs (space separated). ""
--auto-anchors Compute anchors from training labels when using anchor head. False
--num-anchors Number of anchors to use when auto-computing. 3
--iou-loss IoU loss type for anchor head (iou, giou, or ciou). giou
--anchor-assigner Anchor assigner strategy (legacy, simota). legacy
--anchor-cls-loss Anchor classification loss (bce, vfl). bce
--simota-topk Top-K IoUs for dynamic-k in SimOTA. 10
--last-se Apply SE/eSE only on the last CNN block. none
--use-batchnorm Enable BatchNorm layers during training/export. False
--last-width-scale Channel scale for last CNN block (e.g., 1.25). 1.0
--num-queries Transformer query count. 10
--d-model Transformer model dimension. 64
--heads Transformer attention heads. 4
--layers Transformer encoder/decoder layers. 3
--dim-feedforward Transformer feedforward dimension. 128
--use-fpn Enable simple FPN for transformer backbone. False

Tiny CNN backbones (--backbone, optional; default keeps the original built-in CNN):

  • microcspnet: CSP-tiny style stem (16/32/64/128) compressed to 64ch, stride 8 output.
  • ultratinyresnet: 16→24→32→48 channel ResNet-like stack with three downsample steps (stride 8). Channel widths and blocks per stage can be overridden via --backbone-channels / --backbone-blocks; optional long skips across stages via --backbone-skip; optional lightweight FPN fusion via --backbone-fpn.
  • enhanced-shufflenet: Enhanced ShuffleNetV2+ inspired (arXiv:2111.00902) with progressive widening and doubled refinements, ending at ~128ch, stride 8. All custom backbones can optionally apply SE/eSE on the backbone output via --backbone-se {none,se,ese}.

Augmentation via YAML

Click to expand
  • Specify a YAML file with --aug-config to run the data_augment: entries in the listed order (e.g., --aug-config uhd/aug.yaml).
  • Supported ops (examples): Mosaic / MixUp / CopyPaste / HorizontalFlip (class_swap_map supported) / VerticalFlip / RandomScale / Translation / RandomCrop / RandomResizedCrop / RandomBrightness / RandomContrast / RandomSaturation / RandomHSV / RandomPhotometricDistort / Blur / MedianBlur / MotionBlur / GaussianBlur / GaussNoise / ImageCompression / ISONoise / RandomRain / RandomFog / RandomSunFlare / CLAHE / ToGray / RemoveOutliers.
  • If prob is provided, it is used as the apply probability; otherwise defaults are used (most are 0, RandomPhotometricDistort defaults to 0.5). Unknown keys are ignored.

Loss terms (CNN / CenterNet)

Click to expand
  • loss: total loss (hm + off + wh)
  • hm: focal loss on center heatmap
  • off: L1 loss on center offsets (within-cell quantization correction)
  • wh: L1 loss on width/height (feature-map scale)

Loss terms (CNN / Anchor head, --use-anchor)

Click to expand
  • loss: total anchor loss (box + obj + cls [+ quality] when --use-improved-head)
  • obj: BCE on objectness for each anchor location (positive vs. background)
  • cls: BCE on per-class logits for positive anchors (one-hot over target classes)
  • box: (1 - IoU/GIoU/CIoU) on decoded boxes for positive anchors; IoU flavor set by --iou-loss
  • quality (improved head only): BCE on IoU-linked quality logit; obj targetもIoUでスケールされる

Loss terms (Transformer)

Click to expand
  • loss: total loss (cls + l1 + iou)
  • cls: cross-entropy for class vs. background
  • l1: L1 loss on box coordinates
  • iou: 1 - IoU for matched predictions

The impact of image downsampling methods

Click to expand

PyTorch's Resize method is implemented using a downsampling method similar to PIL on the backend, but it is significantly different from OpenCV's downsampling implementation. Therefore, when downsampling images in preprocessing during training, it is important to note that the numerical characteristics of the images used by the model for training will be completely different depending on whether you use PyTorch's Resize method or OpenCV's Resize method. Below is the pixel-level error calculation when downsampling an image to 64x64 pixels. If the diff value is greater than 1.0, the images are completely different.

Therefore, it is easy to imagine that if the downsampling method used for preprocessing during learning is different from the downsampling method used during inference, the output inference results will be disastrous.

The internal workings of PyTorch's downsampling and PIL's downsampling are very similar but slightly different. When deploying and inferencing in Python and other environments, accuracy will be significantly degraded unless the model is deployed according to the following criteria. If you train using OpenCV's cv2.INTER_LINEAR, the model will never produce the correct output after preprocessing in PyTorch, TensorFlow, or ONNX other than OpenCV.

Training Deploy
When training while downsampling using PyTorch's Resize (InterpolationMode.BILINEAR) Merge Resize Linear + half-pixel at the input of the ONNX model. This will result in the highest model accuracy. However, it will be limited to deployment on hardware, NPUs, and frameworks that support the resize operation of bilinear interpolation. It is not suitable for quantization.
When training while downsampling using PyTorch's Resize (InterpolationMode.NEAREST) Merge Resize Nearest at the input of the ONNX model. It is the most versatile in terms of HW, NPU, and quantization deployment, but the accuracy of the model will be lower.
When training while downsampling using OpenCV's Resize (cv2.INTER_NEAREST) Merge Resize Nearest at the input of the ONNX model or OpenCV INTER_NEAREST. Although the accuracy is low, it is highly versatile because the downsampling of images can be freely written on the program side. However, downsampling must be implemented manually.
  1. Error after PIL conversion when downsampling with PyTorch's Resize InterpolationMode.BILINEAR
    PyTorch(InterpolationMode.BILINEAR) -> Convert to PIL vs PyTorch Tensor(InterpolationMode.BILINEAR)
      max  diff : 1
      mean diff : 0.4949
      std  diff : 0.5000
    
  2. Error when downsampling with PyTorch's Resize InterpolationMode.BILINEAR compared to downsampling with OpenCV's INTER_LINER
    PyTorch(InterpolationMode.BILINEAR) -> Convert to PIL vs OpenCV INTER_LINEAR
      max  diff : 104
      mean diff : 10.2930
      std  diff : 13.2792
    
  3. Error when downsampling with PyTorch's Resize InterpolationMode.BILINEAR compared to downsampling with OpenCV's INTER_LINER
    PyTorch Tensor(InterpolationMode.BILINEAR) vs OpenCV INTER_LINEAR
      max  diff : 104
      mean diff : 10.3336
      std  diff : 13.2463
    
  4. Accuracy and speed of each interpolation method when downsampling in OpenCV
    • Accuracy: INTER_NEAREST < INTER_LINEAR < INTER_AREA, Speed: INTER_NEAREST > INTER_LINEAR > INTER_AREA
    === Resize benchmark ===
    INTER_NEAREST : 0.0061 ms
    INTER_LINEAR  : 0.0143 ms
    INTER_AREA    : 0.3621 ms
    AREA / LINEAR ratio : 25.40x
    

ONNX export

Click to expand
  • Export a checkpoint to ONNX (auto-detects arch from checkpoint unless overridden):
    SIZE=64x64
    ANCHOR=8
    CNNWIDTH=64
    RESIZEMODE=opencv_inter_nearest_y
    CKPT=runs/ultratinyod_res_anc8_w64_loese_64x64_lr0.005_impaug/best_utod_0001_map_0.00000.pt
    uv run python export_onnx.py \
    --checkpoint ${CKPT} \
    --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static.onnx \
    --opset 17
    uv run python export_onnx.py \
    --checkpoint ${CKPT} \
    --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic.onnx \
    --opset 17 \
    --dynamic-resize
    uv run python export_onnx.py \
    --checkpoint ${CKPT} \
    --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static_nopost.onnx \
    --opset 17 \
    --no-merge-postprocess
    uv run python export_onnx.py \
    --checkpoint ${CKPT} \
    --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic_nopost.onnx \
    --opset 17 \
    --no-merge-postprocess \
    --dynamic-resize
    
    SIZE=64x64
    ANCHOR=8
    CNNWIDTH=96
    RESIZEMODE=opencv_inter_nearest_y
    CKPT=runs/ultratinyod_res_anc8_w96_loese_64x64_lr0.005_impaug/best_utod_0001_map_0.00000.pt
    uv run python export_onnx.py \
    --checkpoint ${CKPT} \
    --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static.onnx \
    --opset 17
    uv run python export_onnx.py \
    --checkpoint ${CKPT} \
    --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic.onnx \
    --opset 17 \
    --dynamic-resize
    uv run python export_onnx.py \
    --checkpoint ${CKPT} \
    --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static_nopost.onnx \
    --opset 17 \
    --no-merge-postprocess
    uv run python export_onnx.py \
    --checkpoint ${CKPT} \
    --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic_nopost.onnx \
    --opset 17 \
    --no-merge-postprocess \
    --dynamic-resize
    
    SIZE=64x64
    ANCHOR=8
    CNNWIDTH=128
    RESIZEMODE=opencv_inter_nearest_y
    CKPT=runs/ultratinyod_res_anc8_w128_loese_64x64_lr0.005_impaug/best_utod_0001_map_0.00000.pt
    uv run python export_onnx.py \
    --checkpoint ${CKPT} \
    --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static.onnx \
    --opset 17
    uv run python export_onnx.py \
    --checkpoint ${CKPT} \
    --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic.onnx \
    --opset 17 \
    --dynamic-resize
    uv run python export_onnx.py \
    --checkpoint ${CKPT} \
    --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static_nopost.onnx \
    --opset 17 \
    --no-merge-postprocess
    uv run python export_onnx.py \
    --checkpoint ${CKPT} \
    --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic_nopost.onnx \
    --opset 17 \
    --no-merge-postprocess \
    --dynamic-resize
    
    SIZE=64x64
    ANCHOR=8
    CNNWIDTH=160
    RESIZEMODE=opencv_inter_nearest_y
    CKPT=runs/ultratinyod_res_anc8_w160_loese_64x64_lr0.005_impaug/best_utod_0001_map_0.00000.pt
    uv run python export_onnx.py \
    --checkpoint ${CKPT} \
    --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static.onnx \
    --opset 17
    uv run python export_onnx.py \
    --checkpoint ${CKPT} \
    --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic.onnx \
    --opset 17 \
    --dynamic-resize
    uv run python export_onnx.py \
    --checkpoint ${CKPT} \
    --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static_nopost.onnx \
    --opset 17 \
    --no-merge-postprocess
    uv run python export_onnx.py \
    --checkpoint ${CKPT} \
    --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic_nopost.onnx \
    --opset 17 \
    --no-merge-postprocess \
    --dynamic-resize
    
    SIZE=64x64
    ANCHOR=8
    CNNWIDTH=192
    RESIZEMODE=opencv_inter_nearest_y
    CKPT=runs/ultratinyod_res_anc8_w192_loese_64x64_lr0.005_impaug/best_utod_0001_map_0.00000.pt
    uv run python export_onnx.py \
    --checkpoint ${CKPT} \
    --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static.onnx \
    --opset 17
    uv run python export_onnx.py \
    --checkpoint ${CKPT} \
    --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic.onnx \
    --opset 17 \
    --dynamic-resize
    uv run python export_onnx.py \
    --checkpoint ${CKPT} \
    --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static_nopost.onnx \
    --opset 17 \
    --no-merge-postprocess
    uv run python export_onnx.py \
    --checkpoint ${CKPT} \
    --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic_nopost.onnx \
    --opset 17 \
    --no-merge-postprocess \
    --dynamic-resize
    
    SIZE=64x64
    ANCHOR=8
    CNNWIDTH=256
    RESIZEMODE=opencv_inter_nearest_y
    CKPT=runs/ultratinyod_res_anc8_w256_loese_64x64_lr0.005_impaug/best_utod_0001_map_0.00000.pt
    uv run python export_onnx.py \
    --checkpoint ${CKPT} \
    --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static.onnx \
    --opset 17
    uv run python export_onnx.py \
    --checkpoint ${CKPT} \
    --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic.onnx \
    --opset 17 \
    --dynamic-resize
    uv run python export_onnx.py \
    --checkpoint ${CKPT} \
    --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_static_nopost.onnx \
    --opset 17 \
    --no-merge-postprocess
    uv run python export_onnx.py \
    --checkpoint ${CKPT} \
    --output ultratinyod_res_anc${ANCHOR}_w${CNNWIDTH}_${SIZE}_${RESIZEMODE}_dynamic_nopost.onnx \
    --opset 17 \
    --no-merge-postprocess \
    --dynamic-resize

LiteRT (TFLite) quantization

Click to expand
uv run onnx2tf \
-i ultratinyod_res_anc8_w64_64x64_quality_relu_nopost.onnx \
-cotof \
-oiqt

ESP-DL Quantization

This repository includes a calibration/quantization script for ESP-DL: uhd/quantize_onnx_model_for_esp32.py.

Image-only calibration (default)

Click to expand
uv run python uhd/quantize_onnx_model_for_esp32.py \
--dataset-type image \
--image-dir data/wholebody34/obj_train_data \
--resize-mode opencv_inter_nearest_yuv422 \
--onnx-model ultratinyod_res_anc8_w16_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.onnx \
--espdl-model ultratinyod_res_anc8_w16_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.espdl \
--target "esp32s3" \
--calib-algorithm kl \
--int16-op-pattern "/model/backbone/block1/dw/conv/Conv" \
--int16-op-pattern "/model/head/context_res/context_res.2/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.0/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.1/dw/conv/Conv"

uv run python uhd/quantize_onnx_model_for_esp32.py \
--dataset-type image \
--image-dir data/wholebody34/obj_train_data \
--resize-mode opencv_inter_nearest_yuv422 \
--onnx-model ultratinyod_res_anc8_w24_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.onnx \
--espdl-model ultratinyod_res_anc8_w24_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.espdl \
--target esp32s3 \
--calib-algorithm kl \
--int16-op-pattern "/model/head/context_res/context_res.0/dw/conv/Conv" \
--int16-op-pattern "/model/head/box_tower/box_tower.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/quality_tower/quality_tower.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.0/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.1/dw/conv/Conv"

uv run python uhd/quantize_onnx_model_for_esp32.py \
--dataset-type image \
--image-dir data/wholebody34/obj_train_data \
--resize-mode opencv_inter_nearest_yuv422 \
--onnx-model ultratinyod_res_anc8_w32_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.onnx \
--espdl-model ultratinyod_res_anc8_w32_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.espdl \
--target esp32s3 \
--calib-algorithm kl \
--int16-op-pattern "/model/head/context_res/context_res.0/dw/conv/Conv" \
--int16-op-pattern "/model/head/context_res/context_res.2/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.0/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/box_tower/box_tower.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/quality_tower/quality_tower.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/obj_conv/dw/conv/Conv"

uv run python uhd/quantize_onnx_model_for_esp32.py \
--dataset-type image \
--image-dir data/wholebody34/obj_train_data \
--resize-mode opencv_inter_nearest_yuv422 \
--onnx-model ultratinyod_res_anc8_w40_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.onnx \
--espdl-model ultratinyod_res_anc8_w40_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.espdl \
--target esp32s3 \
--calib-algorithm kl \
--int16-op-pattern "/model/backbone/block1/dw/conv/Conv" \
--int16-op-pattern "/model/head/context_res/context_res.2/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_down/large_obj_down.0/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.0/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/box_tower/box_tower.0/dw/conv/Conv" \
--int16-op-pattern "/model/head/obj_conv/dw/conv/Conv" \
--int16-op-pattern "/model/head/quality_tower/quality_tower.0/dw/conv/Conv" \
--int16-op-pattern "/model/head/box_tower/box_tower.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/quality_tower/quality_tower.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/context_res/context_res.0/dw/conv/Conv"

uv run python uhd/quantize_onnx_model_for_esp32.py \
--dataset-type image \
--image-dir data/wholebody34/obj_train_data \
--resize-mode opencv_inter_nearest_yuv422 \
--onnx-model ultratinyod_res_anc8_w48_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.onnx \
--espdl-model ultratinyod_res_anc8_w48_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.espdl \
--target esp32s3 \
--calib-algorithm kl \
--int16-op-pattern "/model/backbone/block1/dw/conv/Conv" \
--int16-op-pattern "/model/head/context/dw/conv/Conv" \
--int16-op-pattern "/model/head/context_res/context_res.2/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.0/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/box_tower/box_tower.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/quality_tower/quality_tower.1/dw/conv/Conv"

uv run python uhd/quantize_onnx_model_for_esp32.py \
--dataset-type image \
--image-dir data/wholebody34/obj_train_data \
--resize-mode opencv_inter_nearest_yuv422 \
--onnx-model ultratinyod_res_anc8_w56_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.onnx \
--espdl-model ultratinyod_res_anc8_w56_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.espdl \
--target esp32s3 \
--calib-algorithm kl \
--int16-op-pattern "/model/head/context_res/context_res.0/dw/conv/Conv" \
--int16-op-pattern "/model/head/context_res/context_res.2/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/box_tower/box_tower.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/quality_tower/quality_tower.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.0/dw/conv/Conv"

uv run python uhd/quantize_onnx_model_for_esp32.py \
--dataset-type image \
--image-dir data/wholebody34/obj_train_data \
--resize-mode opencv_inter_nearest_yuv422 \
--onnx-model ultratinyod_res_anc8_w64_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.onnx \
--espdl-model ultratinyod_res_anc8_w64_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.espdl \
--target esp32s3 \
--calib-algorithm kl \
--int16-op-pattern "/model/head/context_res/context_res.2/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.0/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/box_tower/box_tower.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/quality_tower/quality_tower.1/dw/conv/Conv"

uv run python uhd/quantize_onnx_model_for_esp32.py \
--dataset-type image \
--image-dir data/wholebody34/obj_train_data \
--resize-mode opencv_inter_nearest_yuv422 \
--onnx-model ultratinyod_res_anc8_w72_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.onnx \
--espdl-model ultratinyod_res_anc8_w72_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.espdl \
--target esp32s3 \
--calib-algorithm kl \
--int16-op-pattern "/model/head/context_res/context_res.2/dw/conv/Conv" \
--int16-op-pattern "/model/head/large_obj_blocks/large_obj_blocks.1/dw/conv/Conv" \
--int16-op-pattern "/model/head/box_tower/box_tower.1/dw/conv/Conv"

Notes:

  • The YUV422 models expect input shape [1, 2, 64, 64] with opencv_inter_nearest_yuv422 preprocessing.
  • --dataset-type image is the default and ignores labels.
  • Adjust --calib-steps, --batch-size, --target, --num-of-bits, and --device as needed.

CLI options

Click to expand
  • --image-dir: Directory containing calibration images.
  • --dataset-type: Calibration dataset type (image or yolo, default image).
  • --list-path: Optional text file listing images to use.
  • --export-anchors-wh-scale-dir: Directory to save {onnx-model}_anchors.npy and {onnx-model}_wh_scale.npy (default: same directory as --espdl-model).
  • --expand-group-conv: Expand groups > 1 conv into group=1 (default: disabled).
  • --img-size: Square input size used for calibration (default 64).
  • --resize-mode: Resize mode (default opencv_inter_nearest_yuv422).
  • --class-ids: Comma-separated class IDs to keep (yolo only, default 0).
  • --split: Dataset split for calibration (train, val, all, default all).
  • --val-split: Validation split ratio (ignored when --split all, default 0.0).
  • --batch-size: Calibration batch size (default 1).
  • --calib-steps: Number of calibration steps (default 32).
  • --calib-algorithm: Calibration algorithm (default kl; examples: minmax, mse, percentile).
  • --int16-op-pattern: Regex pattern to force matched ops to int16 (repeatable).
  • --onnx-model: Path to the input ONNX model (default ultratinyod_res_anc8_w16_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.onnx).
  • --espdl-model: Path to the output .espdl file (default ultratinyod_res_anc8_w16_64x64_opencv_inter_nearest_yuv422_distill_static_nopost.espdl).
  • --target: Quantize target type (c, esp32s3, esp32p4, default esp32s3).
  • --num-of-bits: Quantization bits (default 8).
  • --device: Device for calibration (cpu or cuda, default cpu).

Arch

Click to expand
ONNX LiteRT(TFLite)
ultratinyod_res_anc8_w64_64x64_loese_distill ultratinyod_res_anc8_w64_64x64_loese_distill_float32

Ultra-lightweight classification model series

  1. VSDLM: Visual-only speech detection driven by lip movements - MIT License
  2. OCEC: Open closed eyes classification. Ultra-fast wink and blink estimation model - MIT License
  3. PGC: Ultrafast pointing gesture classification - MIT License
  4. SC: Ultrafast sitting classification - MIT License
  5. PUC: Phone Usage Classifier is a three-class image classification pipeline for understanding how people interact with smartphones - MIT License
  6. HSC: Happy smile classifier - MIT License
  7. WHC: Waving Hand Classification - MIT License
  8. UHD: Ultra-lightweight human detection - MIT License

Citation

If you find this project useful, please consider citing:

@software{hyodo2025uhd,
  author    = {Katsuya Hyodo},
  title     = {PINTO0309/UHD},
  month     = {12},
  year      = {2025},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.17790207},
  url       = {https://github.com/PINTO0309/uhd},
  abstract  = {Ultra-lightweight human detection. The number of parameters does not correlate to inference speed.},
}

Ref

About

Ultra-lightweight human detection. The number of parameters does not correlate to inference speed. For limited use cases, an input image resolution of 64x64 is sufficient. High-level object detection architectures such as YOLO are overkill.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages