RPSVision - Real-Time Rock Paper Scissors Recognition

An intelligent computer vision system that recognizes Rock-Paper-Scissors hand gestures in real-time using deep learning

Features • Installation • Usage • How It Works • Training • Results

Overview

RPSVision combines state-of-the-art computer vision and deep learning to create an accurate, real-time gesture recognition system. Whether you're building an interactive game, a gesture-based interface, or exploring computer vision, RPSVision provides a solid foundation.

Core Technologies

MediaPipe Hands - Real-time hand landmark detection (21 keypoints)
Custom CNN - Lightweight PyTorch model optimized for gesture classification
ONNX Runtime - Efficient model deployment with cross-platform support
OpenCV - Real-time video processing and visualization

Features

Core Capabilities

✅ Real-Time Detection - Process webcam feed at 30+ FPS
✅ High Accuracy - 95.43% test accuracy on gesture classification
✅ Lightweight Model - Fast inference suitable for edge devices
✅ Easy Deployment - ONNX model for production environments
✅ Extensible - Simple architecture for adding new gestures
✅ GPU Accelerated - Automatic GPU utilization when available

Development Features

Comprehensive training pipeline with validation
Real-time training metrics and visualization
Clean, modular codebase
Detailed logging and error handling
Data augmentation support

Project Structure

RPSVision/
│
├── .gitignore                 # Ignore file
├── main.py                    # Application entry point
├── requirements.txt           # Python dependencies
├── README.md                  # Project documentation
│
├── checkpoint/                # Saved models
│   ├── best_model.pth           # PyTorch checkpoint
│   └── best_model.onnx          # ONNX deployment model
│
├── data/                      # Training dataset
│   ├── rock/                    # Rock gesture images
│   ├── paper/                   # Paper gesture images
│   └── scissors/                # Scissors gesture images
│
├── src/                       # Source code
│   ├── constant/
│   │   └── constant.py          # Configuration & constants
│   │
│   ├── model/
│   │   └── classifier.py        # CNN architecture
│   │
│   ├── gestures/
│   │   └── recognizer.py        # Gesture recognition engine
│   │
│   ├── data/
│   │   ├── custom_data.py       # Custom dataset class
│   │   ├── dataloader.py        # Data loading utilities
│   │   └── metadata.py          # Dataset metadata
│   │
│   ├── training/
│   │   └── train.py             # Training pipeline
│   │
│   ├── pipeline/
│   │   ├── main.py              # Pipeline orchestration
│   │   └── pipeline.py          # Inference pipeline
│   │
│   └── utils/
│       └── helper.py            # Utility functions

Installation

Prerequisites

Python 3.8 or higher
Webcam (built-in or external)
pip package manager

Step 1: Clone the Repository

git clone https://github.com/A-Ahmed-I/RPSVision.git
cd RPSVision

Step 2: Create Virtual Environment (Recommended)

# Windows
python -m venv venv
venv\Scripts\activate

# macOS/Linux
python3 -m venv venv
source venv/bin/activate

Step 3: Install Dependencies

pip install -r requirements.txt

Step 4: Verify Installation

python -c "import torch; import cv2; import mediapipe; print('✓ All dependencies installed successfully')"

Usage

Quick Start

Launch the application with a single command:

python main.py

Controls

Key	Action
`Q`	Quit the application
`ESC`	Exit fullscreen (if applicable)

Tips for Best Results

Lighting - Ensure good, even lighting on your hand
Distance - Keep your hand 30-50cm from the camera
Positioning - Center your hand in the frame
Background - Use a plain background for better detection
Visibility - Keep all fingers visible to the camera

Expected Output

The application displays:

Live webcam feed
Hand detection bounding box
Predicted gesture (Rock/Paper/Scissors)
Confidence score (0-100%)

How It Works

Pipeline Architecture

┌─────────────┐        ┌───────────────┐       ┌───────────────┐         ┌───────────────┐
│   Webcam    │ ───▶  │   MediaPipe    │ ───▶ │    Crop &     │   ───▶ │      CNN      │
│   Capture   │        │  Hand Detect  │       │  Preprocess   │         │  Classifier   │
└─────────────┘        └───────┬───────┘       └───────────────┘         └───────┬───────┘
                               │                                                 │
                               ▼                                                 ▼
                         ┌───────────────┐                                ┌───────────────┐
                         │  21 Landmark  │                                │  Prediction   │
                         │    Points     │                                │  + Confidence │
                         └───────────────┘                                └───────────────┘

Detailed Process

1️⃣ Hand Detection

MediaPipe Hands detects hands in the video frame and extracts 21 landmark points:

# Key landmarks detected:
- Wrist (0)
- Thumb (1-4)
- Index finger (5-8)
- Middle finger (9-12)
- Ring finger (13-16)
- Pinky (17-20)

2️⃣ Region Extraction

The system:

Calculates a bounding box around detected landmarks
Adds padding for context
Crops the region containing the hand
Handles edge cases (frame boundaries)

3️⃣ Preprocessing

Images are standardized for the CNN:

Transform Pipeline:
- Resize: 224×224 pixels
- Normalize: Divide by 255.0
- Convert: HWC → CHW format
- Batch: Add batch dimension

4️⃣ Classification

The custom CNN processes the image:

Input: [1, 3, 224, 224]
   ↓
Conv Block 1: 16 filters
   ↓
Conv Block 2: 32 filters
   ↓
Conv Block 3: 64 filters
   ↓
Flatten
   ↓
Fully Connected: 3 classes
   ↓
Softmax: [Rock, Paper, Scissors]

5️⃣ Display

Visual feedback includes:

Green bounding box around hand
Gesture label with confidence
FPS counter (optional)
Hand landmarks (optional)

Training

Dataset Preparation

Organize your data as follows:

data/
├── rock/
│   ├── img_001.jpg
│   ├── img_002.jpg
│   └── ...
├── paper/
│   ├── img_001.jpg
│   └── ...
└── scissors/
    ├── img_001.jpg
    └── ...

Training Your Own Model

from src.training.train import ModelTrainer
from src.model.classifier import RPSClassifier
from src.data.dataloader import create_dataloaders
import torch.optim as optim
import torch.nn as nn

# Initialize model
model = RPSClassifier(
    in_channels=3,
    base_filters=32,
    num_classes=3
)

# Create data loaders
train_loader, val_loader, test_loader = create_dataloaders(
    data_dir='data/',
    batch_size=32,
    val_split=0.15,
    test_split=0.15
)

# Setup training
optimizer = optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()

# Create trainer
trainer = ModelTrainer(
    model=model,
    optimizer=optimizer,
    criterion=criterion,
    train_loader=train_loader,
    val_loader=val_loader,
    epochs=30,
    device='cuda'  # or 'cpu'
)

# Train
history = trainer.train()

# Export to ONNX
trainer.export_to_onnx(output_path='checkpoint/best_model.onnx')

Training Configuration

# Hyperparameters
BATCH_SIZE = 32
LEARNING_RATE = 0.001
EPOCHS = 30
OPTIMIZER = 'Adam'
WEIGHT_DECAY = 1e-4

Results

Model Performance

Metric	Value
Test Accuracy	95.43%
Test F1 Score	95.42%
Best Val F1	97.71%
Training Time	~15 minutes (GPU)
Model Size	2.3 MB
Inference Speed	~5ms per frame

Training Progress

Epoch	Train Loss	Train Acc	Val Loss	Val Acc	Val F1	Status
1	1.0841	42.13%	1.0338	61.19%	60.40%	⭐ Best
2	0.8391	72.31%	0.6529	78.08%	78.09%	⭐ Best
5	0.4403	84.52%	0.4277	85.39%	85.40%	⭐ Best
9	0.2216	91.84%	0.2033	92.69%	92.67%	⭐ Best
12	0.1266	95.89%	0.1317	95.43%	95.41%	⭐ Best
15	0.0784	97.98%	0.0905	97.72%	97.71%	⭐ Best
18	0.0414	99.09%	0.0720	97.72%	97.71%	⭐ Best
30	0.0054	100.00%	0.0799	96.80%	96.80%	Final

Visualization

Training History

Key Observations:

Smooth convergence with minimal overfitting
Validation metrics closely track training metrics
Best performance achieved at epoch 15-18
Stable performance after epoch 20

Confusion Matrix

Performance Breakdown:

Rock: High precision, minimal confusion
Paper: Excellent classification accuracy
Scissors: Strong performance across all metrics
Overall: Balanced performance across all three classes

Per-Class Metrics

Class	Precision	Recall	F1-Score	Support
Rock	96.2%	95.8%	96.0%	340
Paper	94.9%	95.1%	95.0%	326
Scissors	95.1%	95.3%	95.2%	334
Avg	95.4%	95.4%	95.4%	1000

Configuration

Adjusting Settings

Edit src/constant/constant.py to customize:

# Model settings
MODEL_PATH = 'checkpoint/best_model.onnx'
INPUT_SIZE = (224, 224)
NUM_CLASSES = 3

# Detection settings
CONFIDENCE_THRESHOLD = 0.5
HAND_DETECTION_CONFIDENCE = 0.7
HAND_TRACKING_CONFIDENCE = 0.5

# Display settings
SHOW_FPS = True
SHOW_LANDMARKS = False
BBOX_COLOR = (0, 255, 0)  # Green
TEXT_COLOR = (255, 255, 255)  # White

Troubleshooting

Common Issues

Camera Not Detected

# Test camera access
python -c "import cv2; cap = cv2.VideoCapture(0); print('Camera:', cap.isOpened())"

# Try different camera indices
# In main.py, change: cv2.VideoCapture(0) to cv2.VideoCapture(1)

Low FPS

Solutions:

Close other applications using the camera
Reduce input resolution in the code
Use ONNX model instead of PyTorch
Enable GPU acceleration

Poor Recognition Accuracy

Solutions:

Improve lighting conditions
Ensure hand is properly centered
Retrain with more diverse data
Adjust confidence threshold

Module Import Errors

# Reinstall dependencies
pip install --upgrade -r requirements.txt

# Verify installation
pip list | grep -E "torch|opencv|mediapipe"

Use Cases

Interactive Gaming - Build gesture-controlled games
Mobile Applications - Touchless interfaces
Robotics - Hand gesture control systems
Education - Teaching computer vision concepts
Accessibility - Alternative input methods
Creative Applications - Gesture-based art tools

Roadmap

Support for multiple simultaneous hands
Additional gestures (numbers, letters)
Mobile deployment (iOS/Android)
Web-based demo using ONNX.js
Gesture customization interface
Performance benchmarking suite
Docker containerization

Contributing

We welcome contributions! Here's how you can help:

Fork the repository
Create a feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

Contribution Areas

Bug fixes
New features
Documentation improvements
Test coverage
UI/UX enhancements
Performance optimizations

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

MediaPipe Team - Hand detection framework
PyTorch Team - Deep learning framework
OpenCV Community - Computer vision tools
ONNX Runtime - Model deployment solution

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
checkpoint		checkpoint
data		data
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

RPSVision - Real-Time Rock Paper Scissors Recognition

Overview

Core Technologies

Features

Core Capabilities

Development Features

Project Structure

Installation

Prerequisites

Step 1: Clone the Repository

Step 2: Create Virtual Environment (Recommended)

Step 3: Install Dependencies

Step 4: Verify Installation

Usage

Quick Start

Controls

Tips for Best Results

Expected Output

How It Works

Pipeline Architecture

Detailed Process

1️⃣ Hand Detection

2️⃣ Region Extraction

3️⃣ Preprocessing

4️⃣ Classification

5️⃣ Display

Training

Dataset Preparation

Training Your Own Model

Training Configuration

Results

Model Performance

Training Progress

Visualization

Training History

Confusion Matrix

Per-Class Metrics

Configuration

Adjusting Settings

Troubleshooting

Common Issues

Camera Not Detected

Low FPS

Poor Recognition Accuracy

Module Import Errors

Use Cases

Roadmap

Contributing

Contribution Areas

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages