Skip to content
/ WHC Public

Waving Hand Classification. Ultrafast 1x3x4x32x32 3DConv gesture estimation.

License

Notifications You must be signed in to change notification settings

PINTO0309/WHC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WHC

DOI GitHub License Ask DeepWiki

Waving Hand Classification. Ultrafast 1x3x4x32x32 3DConv gesture estimation.

output_1.mp4
output_2.mp4
Variant Size Seq F1 CPU
inference
latency
ONNX
static seq
ONNX
dynamic seq
S 1.1 MB 4 0.9821 0.31 ms Download Download
M 1.1 MB 6 0.9916 0.46 ms Download Download
L 1.1 MB 8 0.9940 0.37 ms Download Download

Data sample

1 2 3 4
image image image image

Setup

git clone https://github.com/PINTO0309/WHC.git && cd WHC
curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync
source .venv/bin/activate

Inference

uv run python demo_whc.py \
-wm whc_seq_3dcnn_4x32x32.onnx \
-v 0 \
-ep cuda \
-dlr -dnm -dgm -dhm -dhd

uv run python demo_whc.py \
-wm whc_seq_3dcnn_4x32x32.onnx \
-v 0 \
-ep tensorrt \
-dlr -dnm -dgm -dhm -dhd

Dataset Preparation

uv run python 01_data_prep_realdata.py
class_distribution

Training Pipeline

3DCNN:

SEQ=4
SIZE=32x32
uv run python -m whc train \
--data_root data/dataset.parquet \
--output_dir runs/whc_seq_3dcnn_${SEQ}x${SIZE} \
--epochs 100 \
--batch_size 256 \
--train_resampling balanced \
--image_size ${SIZE} \
--base_channels 32 \
--seed 42 \
--device auto \
--use_amp \
--use_sequence 3dcnn \
--sequence_len ${SEQ}

LSTM:

SEQ=4
SIZE=32x32
uv run python -m whc train \
--data_root data/dataset.parquet \
--output_dir runs/whc_seq_lstm_${SEQ}x${SIZE} \
--epochs 100 \
--batch_size 256 \
--train_resampling balanced \
--image_size 32x32 \
--base_channels 32 \
--seed 42 \
--device auto \
--use_amp \
--use_sequence lstm \
--sequence_len ${SEQ}
  • Outputs include the latest 10 whc_epoch_*.pt, the latest 10 whc_best_epochXXXX_f1_YYYY.pt (highest validation F1, or training F1 when no validation split), history.json, summary.json, optional test_predictions.csv, and train.log.
  • After every epoch a confusion matrix and ROC curve are saved under runs/whc/diagnostics/<split>/confusion_<split>_epochXXXX.png and roc_<split>_epochXXXX.png.
  • --image_size accepts either a single integer for square crops (e.g. --image_size 32) or HEIGHTxWIDTH to resize non-square frames (e.g. --image_size 64x48).
  • Add --resume <checkpoint> to continue from an earlier epoch. Remember that --epochs indicates the desired total epoch count (e.g. resuming --epochs 40 after training to epoch 30 will run 10 additional epochs).
  • Launch TensorBoard with:
    tensorboard --logdir runs/whc

ONNX Export

uv run python -m whc exportonnx \
--checkpoint runs/whc_seq_3dcnn_${SEQ}x${SIZE}/whc_best_epoch0049_f1_0.9939.pt \
--output whc_seq_3dcnn_4x32x32.onnx \
--opset 17

Arch

whc_seq_3dcnn_4x32x32

Ultra-lightweight classification model series

  1. VSDLM: Visual-only speech detection driven by lip movements - MIT License
  2. OCEC: Open closed eyes classification. Ultra-fast wink and blink estimation model - MIT License
  3. PGC: Ultrafast pointing gesture classification - MIT License
  4. SC: Ultrafast sitting classification - MIT License
  5. PUC: Phone Usage Classifier is a three-class image classification pipeline for understanding how people interact with smartphones - MIT License
  6. HSC: Happy smile classifier - MIT License
  7. WHC: Waving Hand Classification - MIT License
  8. UHD: Ultra-lightweight human detection - MIT License

Citation

If you find this project useful, please consider citing:

@software{hyodo2025whc,
  author    = {Katsuya Hyodo},
  title     = {PINTO0309/WHC},
  month     = {11},
  year      = {2025},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.17690769},
  url       = {https://github.com/PINTO0309/whc},
  abstract  = {Waving Hand Classification.},
}

Acknowledgments

  • https://github.com/PINTO0309/PINTO_model_zoo/tree/main/472_DEIMv2-Wholebody34: Apache 2.0 License
    @software{DEIMv2-Wholebody34,
      author={Katsuya Hyodo},
      title={Lightweight human detection models generated on high-quality human data sets. It can detect objects with high accuracy and speed in a total of 28 classes: body, adult, child, male, female, body_with_wheelchair, body_with_crutches, head, front, right-front, right-side, right-back, back, left-back, left-side, left-front, face, eye, nose, mouth, ear, collarbone, shoulder, solar_plexus, elbow, wrist, hand, hand_left, hand_right, abdomen, hip_joint, knee, ankle, foot.},
      url={https://github.com/PINTO0309/PINTO_model_zoo/tree/main/472_DEIMv2-Wholebody34},
      year={2025},
      month={10},
      doi={10.5281/zenodo.17625710}
    }

About

Waving Hand Classification. Ultrafast 1x3x4x32x32 3DConv gesture estimation.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages