CNN-based hand gesture recognition from webcam input. The project combines MediaPipe hand detection with a PyTorch classifier to recognize five classes: thumbs_up, thumbs_down, peace, open_palm, and no_hand.
This was completed as a university Visual Computing group project by Tim Dedie and Nick Somitsch. The original raw webcam dataset is not included in this public repository because it contains personal image data; use collect_data.py to recreate it locally.
See AUTHORS.md for contributor attribution.
- Real-time webcam inference with OpenCV and MediaPipe
- Custom CNN implemented in PyTorch
- Local dataset collection workflow with hand-region cropping
- Hand-written augmentation functions using OpenCV
- Experiments comparing augmentation strategies and dataset sizes
- Best reported test accuracy: 100% for the flip + rotation augmentation setup
collect_data.py- capture labeled training images from a webcamtrain.py- run augmentation and dataset-size experimentsevaluate.py- evaluate a trained model and generate a confusion matrixlive_demo.py- run real-time gesture recognition from a webcammodel.py- CNN architectureaugmentations.py- custom OpenCV augmentationsresults.json- recorded experiment resultsmodels/model_flip_rotate.pth- trained checkpoint used by the live demodocs/- project report, presentation, and generated figures
Install dependencies with uv:
uv syncThe project targets Python 3.11, configured through .python-version.
Run:
uv run collect_data.pyControls:
1- save thumbs up2- save thumbs down3- save peace sign4- save open palm5- save no hand/backgroundq- quit
Captured images are written to dataset/<class_name>/. Aim for at least 100 to 200 images per class.
Run all experiments:
uv run train.pyThis trains models for multiple augmentation configurations and dataset sizes. Results are written to results.json; model checkpoints are written to models/.
Evaluate the default checkpoint:
uv run evaluate.pyOr evaluate a specific checkpoint:
uv run evaluate.py models/model_flip_rotate.pthEvaluation requires a local dataset/ directory with the expected class subfolders.
Run real-time inference with the included checkpoint:
uv run live_demo.pyThe demo opens the webcam, detects the hand region with MediaPipe, classifies the crop, and displays the predicted gesture with a confidence score.
| Augmentation setup | Test accuracy |
|---|---|
| None | 98.33% |
| Flip only | 96.33% |
| Rotate only | 97.67% |
| Brightness only | 98.00% |
| Blur only | 98.67% |
| Flip + rotate | 100.00% |
| All augmentations | 99.67% |
More details are available in docs/visual_computing_report.pdf and TECHNICAL.md.
This project is licensed under the MIT License. See LICENSE for details.