Skip to content

NickSomitsch/hand-gesture-recognition

Repository files navigation

Hand Gesture Recognition

CNN-based hand gesture recognition from webcam input. The project combines MediaPipe hand detection with a PyTorch classifier to recognize five classes: thumbs_up, thumbs_down, peace, open_palm, and no_hand.

This was completed as a university Visual Computing group project by Tim Dedie and Nick Somitsch. The original raw webcam dataset is not included in this public repository because it contains personal image data; use collect_data.py to recreate it locally.

See AUTHORS.md for contributor attribution.

Highlights

  • Real-time webcam inference with OpenCV and MediaPipe
  • Custom CNN implemented in PyTorch
  • Local dataset collection workflow with hand-region cropping
  • Hand-written augmentation functions using OpenCV
  • Experiments comparing augmentation strategies and dataset sizes
  • Best reported test accuracy: 100% for the flip + rotation augmentation setup

Repository Contents

  • collect_data.py - capture labeled training images from a webcam
  • train.py - run augmentation and dataset-size experiments
  • evaluate.py - evaluate a trained model and generate a confusion matrix
  • live_demo.py - run real-time gesture recognition from a webcam
  • model.py - CNN architecture
  • augmentations.py - custom OpenCV augmentations
  • results.json - recorded experiment results
  • models/model_flip_rotate.pth - trained checkpoint used by the live demo
  • docs/ - project report, presentation, and generated figures

Setup

Install dependencies with uv:

uv sync

The project targets Python 3.11, configured through .python-version.

Collect A Dataset

Run:

uv run collect_data.py

Controls:

  • 1 - save thumbs up
  • 2 - save thumbs down
  • 3 - save peace sign
  • 4 - save open palm
  • 5 - save no hand/background
  • q - quit

Captured images are written to dataset/<class_name>/. Aim for at least 100 to 200 images per class.

Train

Run all experiments:

uv run train.py

This trains models for multiple augmentation configurations and dataset sizes. Results are written to results.json; model checkpoints are written to models/.

Evaluate

Evaluate the default checkpoint:

uv run evaluate.py

Or evaluate a specific checkpoint:

uv run evaluate.py models/model_flip_rotate.pth

Evaluation requires a local dataset/ directory with the expected class subfolders.

Live Demo

Run real-time inference with the included checkpoint:

uv run live_demo.py

The demo opens the webcam, detects the hand region with MediaPipe, classifies the crop, and displays the predicted gesture with a confidence score.

Results Summary

Augmentation setup Test accuracy
None 98.33%
Flip only 96.33%
Rotate only 97.67%
Brightness only 98.00%
Blur only 98.67%
Flip + rotate 100.00%
All augmentations 99.67%

More details are available in docs/visual_computing_report.pdf and TECHNICAL.md.

License

This project is licensed under the MIT License. See LICENSE for details.

About

Built a real-time webcam gesture classifier for 5 hand classes using a CNN and MediaPipe hand detection. Collected a custom dataset, compared augmentation and dataset-size experiments, and achieved ~98% accuracy on a held-out test split.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages