This project focuses on translating sign language gestures into text using deep learning and computer vision. By processing video input, the system predicts the corresponding sign language meaning in real-time, making communication more accessible for individuals with hearing or speech impairments.
The approach involves a deep learning pipeline trained on a dataset of sign language gestures. The process consists of:
- Dataset Collection: Capturing sign images using OpenCV from a webcam feed.
- Data Augmentation: Enhancing dataset quality through transformations such as rotation, flipping, brightness changes, and noise addition.
- Model Architecture: Utilizing a pre-trained MobileNetV3 model fine-tuned with additional custom layers for classification.
- Training Process: Optimized with the Adam optimizer and categorical cross-entropy loss to improve classification accuracy.
- Inference: Running real-time predictions on new sign language gestures using webcam input.
- Run
datacollection.pyto collect sign gesture images using a webcam. - Press 'S' to capture an image.
- Press 'Q' to quit the script.
- Images are automatically stored in labeled folders for each gesture.
- Run
dataAug.pyto apply transformations to the dataset. - Press 'S' to start augmentation.
- Press 'Q' to quit the script.
- Augmented images enhance model generalization and prevent overfitting.
- Ensure that the dataset is properly labeled and structured for training.
- Convert images into NumPy arrays for efficient processing.
- Run
model3.pyto train the MobileNetV3-based classifier. - Uses Adam optimizer with a learning rate of 0.001 and categorical cross-entropy loss function.
Training Parameters:
- Batch size: 32
- Epochs: 50
- Validation Split: 20%
- Model checkpoints are saved for resuming training if interrupted.
- Ensure
mobilenetv3_sign_language_model.kerasis available in the working directory. - Run
test.pyto classify sign gestures in real-time using webcam input. - The model processes video frames, detects hands, and predicts the corresponding sign language gesture.
- Press 'S' to capture an image.
- Press 'Q' to quit the script.
| Metric | Value |
|---|---|
| Training Accuracy | 43.53% |
| Training Loss | 1.4674 |
| Test Accuracy | 53.52% |
| Test Loss | 1.2869 |
| Total Training Time | 2825.66 sec |
Logs are stored at: logs/fit/20250303-121854
- Python 3.10
- TensorFlow 2.10
- OpenCV (for video capture and preprocessing)
- NumPy, Matplotlib (for data handling and visualization)
- MediaPipe (for hand detection and tracking)
Since training and inference are computationally intensive, GPU acceleration can significantly reduce processing time.
-
Install NVIDIA CUDA Toolkit
- Download CUDA 11.8 (compatible with TensorFlow 2.10) from NVIDIA CUDA Toolkit.
-
Install cuDNN
- Download cuDNN 8.6.0 (compatible with CUDA 11.8) from NVIDIA cuDNN.
- Extract files and move them to the CUDA installation directory.
-
Verify GPU Installation
python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"- If a GPU is detected, the output will list the available GPU devices.
-
Dataset Collection:
python datacollection.py
- Press 'S' to capture images.
- Press 'Q' to quit the script.
-
Data Augmentation:
python dataAug.py
- Press 'S' to start augmentation.
- Press 'Q' to quit the script.
-
Train the Model:
python model3.py
-
Run Inference (Real-time Sign Prediction):
python test.py
- Press 'S' to capture an image.
- Press 'Q' to quit the script.
This project is licensed under the MIT License.