A computer vision project that uses YOLO (You Only Look Once) models to detect and recognize mahjong tiles from real-world photographs. The project includes multiple model variants (nano, small, medium, large, extra-large) optimized for different use cases.
Dataset: https://www.kaggle.com/datasets/shinz114514/mahjong-hand-photos-taken-with-mobile-camera/data
This project implements mahjong tile recognition using YOLOv11, capable of:
- Detecting mahjong tiles in real-world photographs
- Recognizing different tile types and suits
- Processing images with various lighting conditions and backgrounds
- Providing both PyTorch (.pt) and ONNX model formats for deployment
βββ models/ # Trained models organized by size
β βββ nano/ # YOLOv11n models (fastest, lowest accuracy)
β βββ small/ # YOLOv11s models (balanced speed/accuracy)
β βββ medium/ # YOLOv11m models (good accuracy)
β βββ large/ # YOLOv11l models (high accuracy)
β βββ extra_large/ # YOLOv11x models (highest accuracy)
β βββ *.onnx # ONNX format models for deployment
βββ scripts/ # Utility scripts
β βββ convert_yolo_to_onnx.py # Convert PyTorch models to ONNX
β βββ convert_yolo_to_coreml.py # Convert PyTorch models to CoreML
βββ notebooks/ # Jupyter notebooks for training and analysis
β βββ data_labeling/ # Data annotation and labeling notebooks
β βββ data_processing/ # Data preprocessing notebooks
β βββ yolo.ipynb # YOLO training notebook
β βββ yolo_predict.ipynb # Prediction and evaluation notebook
βββ results/ # Training and evaluation results
β βββ training/ # Training logs, metrics, and model checkpoints
β βββ validation/ # Validation results
β βββ predictions/ # Prediction outputs and visualizations
βββ data/ # Dataset organization
β βββ raw/ # Original images
β βββ processed/ # Preprocessed images
β βββ annotations/ # Label files
βββ docs/ # Documentation
βββ examples/ # Usage examples
βββ README.md # This file
| Model Size | Trained Model | Status | mAP50 | mAP50-95 | Precision | Recall | Size(MB) | Use Case |
|---|---|---|---|---|---|---|---|---|
| Nano | trained_models_v2/yolo11n_best.pt | β Complete | 0.880 | 0.676 | 0.943 | 0.748 | 5.2 | Mobile/Edge devices |
| Small | trained_models_v2/yolo11s_best.pt | β Complete | 0.881 | 0.695 | 0.929 | 0.765 | 18.3 | Real-time applications |
| Medium | trained_models_v2/yolo11m_best.pt | β Complete | 0.865 | 0.652 | 0.822 | 0.772 | 38.7 | Balanced performance |
| Large | trained_models_v2/yolo11l_best.pt | π In Progress | - | - | - | - | - | High accuracy needs |
| Extra Large | - | β³ Planned | - | - | - | - | - | Maximum accuracy |
Performance Summary:
- Best Overall: YOLOv11s (mAP50: 0.881, mAP50-95: 0.695)
- Highest Precision: YOLOv11n (0.943)
- Best Recall: YOLOv11m (0.772)
- Smallest Model: YOLOv11n (5.2MB)
- Nano (YOLOv11n): Fastest inference, optimized for mobile deployment
- Small (YOLOv11s): Good balance of speed and accuracy for real-time applications
- Medium (YOLOv11m): Recommended for most use cases, best accuracy/speed trade-off
- Large (YOLOv11l): High accuracy for production applications
- Extra Large (YOLOv11x): Maximum accuracy when speed is not critical
pip install ultralytics opencv-python matplotlib torch torchvisionpip install jupyter notebook albumentations numpyfrom ultralytics import YOLO
# Load a trained model
model = YOLO('trained_models_v2/yolo11m_best.pt')
# Run inference on an image
results = model.predict('path/to/mahjong/image.jpg')
# Display results
results[0].show()Visual demonstration comparing ground truth labels vs model predictions:
Performance Metrics:
- mAP50 (Mean Average Precision at IoU=0.5): Measures detection accuracy
- mAP50-95 (Mean Average Precision at IoU=0.5:0.95): Stricter accuracy measure
- Precision: Percentage of correct positive predictions
- Recall: Percentage of actual positives correctly identified
Model Recommendations:
- For Mobile Apps: YOLOv11n (5.2MB, 0.943 precision)
- For Real-time Systems: YOLOv11s (best overall mAP50: 0.881)
- For High Recall Needs: YOLOv11m (0.772 recall, good for finding all tiles) β Used in Demo
What the Validation Demo Shows:
- Ground Truth vs Predictions: Side-by-side comparison showing actual labels (left, green) vs model predictions (right, red)
- Real Performance Assessment: Shows both correct detections and model limitations
- Detection Accuracy: 27/29 ground truth objects correctly detected (93.1% recall)
- Precision Analysis: Some false positives visible, showing where model over-detects
- Complex Scene Handling: Demonstrates performance on challenging multi-tile layouts
- YOLOv11m Model: Uses model with best recall (77.2%) for comprehensive tile detection
Additional Results:
- Complete Validation Report - Detailed performance metrics
- 8 Validation Images - Ground truth vs prediction comparisons
- Model Performance Metrics - mAP, precision, recall for all models
- Class Configuration Verification - Confirms correct training setup
Generated using: python3 inference_validation.py - comprehensive validation on multiple test images
import onnxruntime as ort
import cv2
import numpy as np
# Load ONNX model
session = ort.InferenceSession('models/mahjong-yolom-best.onnx')
# Preprocess image
img = cv2.imread('path/to/image.jpg')
img_resized = cv2.resize(img, (640, 640))
img_normalized = img_resized.astype(np.float32) / 255.0
img_transposed = np.transpose(img_normalized, (2, 0, 1))
img_batch = np.expand_dims(img_transposed, axis=0)
# Run inference
outputs = session.run(None, {'images': img_batch})Convert PyTorch models to ONNX format:
python scripts/convert_yolo_to_onnx.py models/medium/mahjong-yolom-best.ptConvert PyTorch models to CoreML format:
python scripts/convert_yolo_to_coreml.py models/medium/mahjong-yolom-best.ptBatch conversion (all models):
python scripts/convert_yolo_to_onnx.py models/ --batch
python scripts/convert_yolo_to_coreml.py models/ --batch-
Organize your dataset in YOLO format:
dataset/ βββ images/ β βββ train/ β βββ val/ β βββ test/ βββ labels/ βββ train/ βββ val/ βββ test/ -
Create a data configuration file (
data.yaml):train: path/to/train/images val: path/to/val/images test: path/to/test/images nc: 34 # number of classes (mahjong tile types) names: ['1m', '2m', '3m', ..., 'red', 'green', 'white']
from ultralytics import YOLO
# Train nano model
model = YOLO('models/nano/yolo11n.pt')
model.train(data='data.yaml', epochs=500, batch=24, name='mahjong-yolon')
# Train small model
model = YOLO('models/small/yolo11s.pt')
model.train(data='data.yaml', epochs=500, batch=16, name='mahjong-yolos')
# Train medium model
model = YOLO('models/medium/yolo11m.pt')
model.train(data='data.yaml', epochs=500, batch=12, name='mahjong-yolom')
# Train large model
model = YOLO('models/large/yolo11l.pt')
model.train(data='data.yaml', epochs=500, batch=10, name='mahjong-yolol')# Validate trained model
model = YOLO('models/medium/mahjong-yolom-best.pt')
metrics = model.val()
print(f"mAP50: {metrics.box.map50}")
print(f"mAP50-95: {metrics.box.map}")Training results include:
- Precision/Recall curves
- F1 score curves
- Confusion matrices
- Training loss graphs
- Validation metrics
The model recognizes 38 different mahjong tile types:
- 1m through 9m, 0m (red five)
- 1p through 9p, 0p (red five)
- 1s through 9s, 0s (red five)
- 1z (East), 2z (South), 3z (West), 4z (North)
- 5z (White Dragon), 6z (Green Dragon), 7z (Red Dragon)
- UNKNOWN class for unclear or damaged tiles
Total Classes: 38 (including red fives and unknown category)
- Update the data configuration file with new classes
- Retrain the model with expanded dataset
- Update the class names in prediction scripts
Key training parameters to adjust:
batch: Batch size (adjust based on GPU memory)lr0: Initial learning rateepochs: Training epochspatience: Early stopping patienceconf: Confidence threshold for predictionsiou: IoU threshold for NMS
- Use nano or small models
- Convert to ONNX format
- Use TensorRT for NVIDIA GPUs
- Optimize input image size
- Use medium, large, or extra-large models
- Increase training epochs
- Use data augmentation
- Ensemble multiple models
- Use ONNX models for cross-platform compatibility
- Use CoreML models for iOS/macOS deployment
- Implement batch processing for multiple images
- Use GPU acceleration when available
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests and documentation
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Zhen Zhang - zhenz@vt.edu
- Yiyun Huang - yiyunh@vt.edu
- Ultralytics for the YOLO implementation
- The computer vision community for datasets and techniques
- Contributors to the mahjong recognition research
For questions and support:
- Open an issue on GitHub
- Check the documentation in the
docs/folder - Review the example notebooks in
notebooks/
Built with β€οΈ for the mahjong and computer vision communities