Sudoku Scanner and Solver

Low-latency on-device OCR and constraint-solving pipeline for mobile devices

A high-performance computer vision and ML inference on edge hardware to detect, recognize, and solve sudoku puzzles in real-time. Built with a focus on speed and efficiency, combining advanced C++ algorithms with a modern Flutter frontend.

sudoku_solver_demo.mov

Key Features:

Rapid Processing: Complete vision-to-solution pipeline completes in ~110ms on mobile hardware (Google Tensor G1)
Deep Learning Integration: Optimized ResNet18 model for digit recognition deployed on edge devices
Efficient Solving: Constraint-propagation algorithm with advanced optimization techniques

Architecture & Implementation

Computer Vision Pipeline

The solution processes captured images through a sophisticated 4-stage pipeline:

Sudoku Detection
- Identifies the largest square region in the frame and extracts it with perspective correction
- Handles real-world challenges like skewed camera angles and varied lighting
Cell Recognition
- Localizes individual cells within the detected grid
- Isolates digit regions from empty cells using automated segmentation
Digit Classification
- Employs a deep learning model (ResNet18 trained on SVHN dataset) for robust digit recognition
- Deployed with multiple optimizations for mobile inference
Sudoku Solving
- Executes constraint-satisfaction algorithm to compute the solution
- Optimized for minimal latency

Model Optimization Strategy

The digit recognition model is optimized across multiple dimensions:

Optimization	Benefit
CNN Architecture	Lightweight design suited for edge device inference
ONNX Format	Framework-agnostic, optimized runtime execution
8-bit Quantization	4x reduced model size with minimal accuracy loss
Batch Processing	Vectorized operations for efficient cell processing

Solving Algorithm

The sudoku solver employs several key optimizations for competitive performance:

Bitmask Constraints: 16-bit integers track occupied cells per row, column, and 3×3 box
MRV Heuristic: Prioritizes cells with minimum remaining values to reduce search space
Lookup Tables: Pre-computed indices eliminate costly division/modulo operations in hot loops
Cache Optimization: Flat std::array structure maximizes CPU cache locality

Design tradeoffs

CNN for digit recognition: Classical template-based OCR approaches were insufficient under perspective distortion and font variability, while full OCR systems were unnecessarily heavy. A lightweight CNN trained for single-digit classification provided the best accuracy–latency tradeoff.
Classical computer vision for grid detection: Board and cell detection are implemented using geometric and morphological operations rather than neural networks, as they are faster, deterministic, and sufficiently robust for this task.
SVHN instead of MNIST: MNIST-trained models showed poor generalization to real-world printed digits, while SVHN provided significantly better robustness to font style, scale, and noise.
ResNet18: Chosen as the smallest architecture with stable convergence on SVHN while remaining quantization-friendly.
ONNX Runtime: Selected over alternatives for easier C++ integration and cross-platform consistency.

Strength and limitations

These design decisions result in a system with the following strengths and known limitations.

Strengths

The system was validated against a variety of real-world conditions:

Robust to strong perspective distortion and oblique viewing angles (excluding curved surfaces)
Supports multiple sudoku boards within a single frame
Works with both printed boards and digital displays (moiré artifacts mitigated)
Handles varied fonts, digit sizes, and board color schemes

Limitations

Performance degrades under very low-light conditions (mitigated via optional camera flash)
Performance degrades on scribbled or dirty boards
Model is trained exclusively on printed digits; handwritten digits are out of scope
Board detection assumes the presence of a clear outer square border

Tech Stack

Component	Technology	Purpose
Mobile App	Flutter, Dart	Cross-platform UI and camera integration
Vision Core	C++23, OpenCV, ONNX Runtime	Performance-critical image processing and inference
ML Model	PyTorch, Hugging Face	Conversion of digit recognition model
Build System	CMake, Gradle	Multi-platform compilation and packaging

Installation & Build

Prerequisites

Flutter SDK
C++ compiler (clang/gcc)
CMake 4.2+
Python 3.12+ with UV package manager

Setup Instructions

1. ML Pipeline (generates ResNet18_SVHN_INT8 model)

cd ml-pipeline
uv sync
python digit_recognition_model.py

2. Vision Core (downloads and builds C++ dependencies)

cd core-vision
python fetch_deps.py

For debugging on macos:

sh ./build.sh macos
# example
./build_macos/main ../images/cover.jpg

3. Mobile App (Android release build)

cd app-mobile
# Ensure model is in place
cp ../models/resnet18_svhn_int8.onnx ./assets/
# Build optimized release APK
flutter build apk --release --target-platform android-arm64

For debugging:

flutter run

Development & Debugging

Potential improvements and roadmap items:

Vision Core

Continuous stream detection: Real-time grid detection without requiring explicit photo capture <!-- idea: keep square detection and add area range to catch it >
Handwritten digit recognition with discriminator capability: Multi-model approach to handle both printed and handwritten numbers, with the capability to distinbuish the two <!-- idea: fiewer resnet layers, build dataset with handwritten mnist/emnist + digital synthetic, add head to discriminate >

Mobile App

Solution visualization toggle: Show/hide computed solution
Detection history: Browse previously captured and solved puzzles
Async processing: Execute C++ solver on background thread to maintain responsive UI

Contributing

This project demonstrates practical application of computer vision, deep learning optimization, and performance engineering. Contributions and suggestions for improvements are welcome.

License

See LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
app-mobile		app-mobile
core-vision		core-vision
ml-pipeline		ml-pipeline
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sudoku Scanner and Solver

Low-latency on-device OCR and constraint-solving pipeline for mobile devices

Table of Contents

Architecture & Implementation

Computer Vision Pipeline

Model Optimization Strategy

Solving Algorithm

Design tradeoffs

Strength and limitations

Strengths

Limitations

Tech Stack

Installation & Build

Prerequisites

Setup Instructions

Development & Debugging

Contributing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sudoku Scanner and Solver

Low-latency on-device OCR and constraint-solving pipeline for mobile devices

Table of Contents

Architecture & Implementation

Computer Vision Pipeline

Model Optimization Strategy

Solving Algorithm

Design tradeoffs

Strength and limitations

Strengths

Limitations

Tech Stack

Installation & Build

Prerequisites

Setup Instructions

Development & Debugging

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages