A high-performance, cross-platform Sudoku solver featuring multiple algorithms, OCR image recognition, comprehensive benchmarking tools, and multi-platform CI/CD support.
I originally built a Sudoku solver as a small father-son coding challenge. It later evolved into a useful benchmark tool to explore algorithmic efficiency and cross-platform CPU performance.
I wrote two articles documenting this journey:
- Sudoku Solver β From a Father-Son Coding Challenge to a CPU Benchmark Tool
- Sudoku Solver β Cross-Platform Performance Deep Dive (Part II)
-
Multiple Algorithms
- Dancing Links (DLX) - Donald Knuth's Algorithm X implementation
- Backtracking with Constraint Propagation
- Automatic algorithm comparison
-
Find All Solutions
--solve-alloption to find multiple solutions- Configurable limit with
--max-solutions - Save all solutions to JSON
-
OCR Support (Windows/Linux/macOS)
- Extract puzzles from images (PNG, JPG, BMP, TIFF)
- Supports 9x9 and 16x16 grids
- Automatic grid detection and perspective correction
-
Built-in Test Puzzles
- 9x9 classic puzzle for quick tests
- 16x16 hard puzzle (77 clues) for standard benchmarks
- 25x25 mega puzzle for stress testing
-
Comprehensive Benchmarking
- Single-threaded and multi-threaded benchmarks
- Cross-platform system information detection
- Performance metrics: throughput, speedup, efficiency
- Professional benchmark reports
-
Cross-Platform Support
- Windows (x64)
- Linux (x64)
- macOS (Universal Binary - Intel & Apple Silicon)
- Android (ARM64) - Solver & Benchmark only
- CMake 3.20+
- C++20 compatible compiler (MSVC 2022, GCC 11+, Clang 14+)
- vcpkg package manager
# Clone repository
git clone https://github.com/user/sudoku-solver.git
cd sudoku-solver
# Windows
cmake --preset windows-x64-Release
cmake --build --preset windows-x64-Release
# Linux
cmake --preset linux-x64-Release
cmake --build --preset linux-x64-Release
# macOS (Universal Binary)
cmake --preset macos-universal-Release
cmake --build --preset macos-universal-Release# Solve a puzzle from JSON
SudokuSolver -i puzzle.json
# Solve from image (OCR) - Windows/Linux/macOS only
SudokuSolver -i sudoku.png
# Use built-in test puzzles
SudokuSolver -t 9 # 9x9 puzzle
SudokuSolver -t 16 # 16x16 puzzle
SudokuSolver -t 25 # 25x25 puzzle (heavy)
# Compare algorithms
SudokuSolver -i puzzle.json -a compare
# Find all solutions (with default limit of 100)
SudokuSolver -i puzzle.json --solve-all
# Find up to 10 solutions
SudokuSolver -i puzzle.json --solve-all --max-solutions 10
# Find all solutions and save to JSON
SudokuSolver -i puzzle.json --solve-all -o solutions.json
# Run benchmark (1000 iterations, auto-detect threads)
SudokuSolver -t 9 -b 1000 -w 0
# Single-threaded benchmark
SudokuSolver -t 25 -b 100
# Multi-threaded benchmark with built-in puzzle
SudokuSolver -t 16 -b 1000 -w 0 -a compare
# Show system info
SudokuSolver -s| Option | Description |
|---|---|
-i, --input <file> |
Input file (JSON or image) |
-t, --test <size> |
Use built-in test puzzle (9, 16, or 25) |
-a, --algorithm <algo> |
Algorithm: dlx, backtrack, compare |
-b, --benchmark <N> |
Run benchmark with N iterations |
-w, --workers <N> |
Number of parallel workers (0 = auto) |
-o, --output <file> |
Output solution to JSON file |
--solve-all |
Find all solutions (default limit: 100) |
--max-solutions <N> |
Maximum solutions to find (0 = unlimited) |
-s, --sysinfo |
Show system information |
-q, --quiet |
Minimal output |
Throughput (T)
Efficiency (E) = βββββββββββββββββββββ
Single-Thread x Workers
Where:
- Throughput = Total solves per second (multi-threaded)
- Speedup = Multi-threaded throughput / Single-threaded throughput
- Efficiency = Speedup / Number of workers x 100%
Efficiency Interpretation:
| Efficiency | Rating | Description |
|---|---|---|
| 90-100% | βββββ | Near-perfect scaling |
| 70-90% | ββββ | Excellent |
| 50-70% | βββ | Good (typical for CPU-bound tasks) |
| 30-50% | ββ | Moderate (synchronization overhead) |
| <30% | β | Poor (bottlenecked) |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β GitHub Actions Runner Performance β
ββββββββββββββ¬βββββββββββββββββββ¬ββββββββββ¬ββββββββββββ¬ββββββββββββ¬ββββββββββββ€
β Platform β CPU β Cores β BT/s β DLX/s β Winner β
ββββββββββββββΌβββββββββββββββββββΌββββββββββΌββββββββββββΌββββββββββββΌββββββββββββ€
β Windows β AMD EPYC 7763 β 2C/4T β 44.86/s β 58.57/s β DLX +31% β
β Linux β AMD EPYC 7763 β 2C/4T β 38.94/s β 67.49/s β DLX +73% β
β macOS β Apple M1 (VM) β 3C/3T β 35.56/s β 94.92/s β DLX+167% β β
ββββββββββββββ΄βββββββββββββββββββ΄ββββββββββ΄ββββββββββββ΄ββββββββββββ΄ββββββββββββ
BT = Backtracking, DLX = Dancing Links
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Single-Threaded Performance Comparison β
ββββββββββββββ¬βββββββββββββββββββ¬βββββββββββ¬βββββββββββββ¬ββββββββββββββββββββββ€
β Platform β CPU β Avg(ms) β Throughput β Notes β
ββββββββββββββΌβββββββββββββββββββΌβββββββββββΌβββββββββββββΌββββββββββββββββββββββ€
β WSL2 β AMD 9950X3D β 17.41 β 57.43/s β Fastest β β
β Windows β AMD 9950X3D β 19.27 β 51.90/s β MSVC overhead β
β Android β Snapdragon 8G3 β 25.46 β 39.27/s β Mobile SoC! β
β macOS VM β Apple M1 (VM) β 31.86 β 31.39/s β VM overhead ~30% β
β Linux CI β AMD EPYC 7763 β 35.98 β 27.79/s β CI resource limit β
β Win CI β AMD EPYC 7763 β 41.51 β 24.09/s β CI resource limit β
ββββββββββββββ΄βββββββββββββββββββ΄βββββββββββ΄βββββββββββββ΄ββββββββββββββββββββββ
Key Observations:
- WSL2 (GCC) outperforms native Windows (MSVC) by ~10% on DLX algorithm
- Snapdragon 8 Gen 3 achieves 68% of desktop performance at 1/34 power consumption
- macOS Runner runs in VM, real M1 estimated ~45/s (based on min latency 22.24ms)
Per-Core Efficiency Comparison (DLX, 25x25 Puzzle)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Platform β Cores β Total/s β Per-Core/s β Notes
βββββββββββββββββββββββΌββββββββΌβββββββββββΌβββββββββββββΌβββββββββββββββββββββββ
Ryzen 9950X3D (WSL2) β 32T β 903.59 β 28.2 β High core count
Ryzen 9950X3D (Win) β 32T β 885.59 β 27.7 β MSVC overhead
Apple M1 (VM) β 3T β 94.92 β 31.6 β Highest IPC!
EPYC 7763 (Linux) β 4T β 67.49 β 16.9 β CI resource limit
EPYC 7763 (Windows) β 4T β 58.57 β 14.6 β CI resource limit
Snapdragon 8 Gen 3 β 8T β 105.31 β 13.2 β big.LITTLE penalty
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Key Insight: Apple M1 has the HIGHEST per-core throughput (31.6/s)
Ryzen 9950X3D wins total throughput purely via core count (32T)
Throughput Comparison (GitHub Runners)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
DLX Algorithm:
macOS (M1) ββββββββββββββββββββββββββββββββββββββββββββββββββββββ 94.92/s β
Linux ββββββββββββββββββββββββββββββββββββ β 67.49/s
Windows βββββββββββββββββββββββββββββββ β 58.57/s
Backtracking Algorithm:
Windows ββββββββββββββββββββββββ β 44.86/s
Linux βββββββββββββββββββββ β 38.94/s
macOS (M1) βββββββββββββββββββ β 35.56/s
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
0 20 40 60 80 100
Test Configuration: ./SudokuSolver -b 100 -w 0 -a compare -t 25
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AMD Ryzen 9 9950X3D (16C/32T) - 192GB DDR5 β
βββββββββββββββ¬ββββββββββββββ¬ββββββββββββββββ¬βββββββββββ¬βββββββββββββββββββββ€
β Platform β Compiler β Throughput β Speedup β Efficiency β
βββββββββββββββΌββββββββββββββΌββββββββββββββββΌβββββββββββΌβββββββββββββββββββββ€
β Windows β MSVC 1944 β β β β
β ββ BT β β 805.40/s β 15.02x β 46.93% β
β ββ DLX β β 885.59/s β 17.68x β 55.25% β
βββββββββββββββΌββββββββββββββΌββββββββββββββββΌβββββββββββΌβββββββββββββββββββββ€
β WSL2 β GCC 11.4 β β β β
β ββ BT β β 561.83/s β 15.87x β 49.60% β
β ββ DLX β β 903.59/s β 15.97x β 49.92% β β
βββββββββββββββ΄ββββββββββββββ΄ββββββββββββββββ΄βββββββββββ΄βββββββββββββββββββββ
BT = Backtracking, DLX = Dancing Links
Key Observation: WSL2 DLX (903.59/s) outperforms native Windows (885.59/s) by ~2%!
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β WSL2 vs Windows Analysis β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Task Manager Color Legend: β
β ββββββββββ β
β β Green β = Kernel Mode (WSL2 workload) β
β β Red β = User Mode (Windows native) β
β ββββββββββ β
β β
β Why WSL2 shows as Kernel Mode? β
β ββ WSL2 runs a real Linux kernel inside Hyper-V β
β ββ From Windows perspective, VM operations = kernel mode β
β ββ The Linux userspace code runs inside the VM β
β β
β Performance Implications: β
β βββββββββββββββββββ¬ββββββββββββββββββ β
β β Algorithm β WSL2 vs Windows β β
β βββββββββββββββββββΌββββββββββββββββββ€ β
β β Backtracking β -30% (slower) β MSVC recursion wins β
β β DLX β +2% (faster) β GCC pointers win β
β βββββββββββββββββββ΄ββββββββββββββββββ β
β β
β Conclusion: β
β - WSL2 virtualization overhead is negligible for CPU-bound β
β - Compiler optimization matters more than OS overhead β
β - DLX benefits from GCC's superior pointer optimization β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
./SudokuSolver -b 100 -w 0 -a compare -t 25
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Snapdragon 8 Gen 3 (Cortex-X4 + A720 + A520) - 12GB LPDDR5 β
βββββββββββββββ¬ββββββββββββββ¬ββββββββββββββββ¬βββββββββββ¬βββββββββββββββββββββ€
β Platform β Compiler β Throughput β Speedup β Efficiency β
βββββββββββββββΌββββββββββββββΌββββββββββββββββΌβββββββββββΌβββββββββββββββββββββ€
β Android β Clang 21.0 β β β β
β ββ BT β β 95.83/s β 3.74x β 46.77% β
β ββ DLX β β 105.31/s β 2.74x β 34.23% β
βββββββββββββββ΄ββββββββββββββ΄ββββββββββββββββ΄βββββββββββ΄βββββββββββββββββββββ
Backtracking vs DLX by Compiler
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
MSVC (Windows)
ββ Backtracking ββββββββββββββββββββββββββββββββββββββββββββ 805/s
ββ DLX ββββββββββββββββββββββββββββββββββββββββββββββββ 886/s
Ξ = +10% (DLX wins)
GCC (Linux/WSL2)
ββ Backtracking ββββββββββββββββββββββββββββββββ 562/s
ββ DLX ββββββββββββββββββββββββββββββββββββββββββββββββββββ 904/s
Ξ = +61% (DLX dominates)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Key Findings:
| Compiler | Backtracking | DLX | Winner | Analysis |
|---|---|---|---|---|
| MSVC | 805/s | 886/s | DLX +10% | MSVC optimizes recursion well |
| GCC | 562/s | 904/s | DLX +61% | GCC excels at pointer/linked-list ops |
| Clang | 96/s | 105/s | DLX +10% | Similar to MSVC pattern |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Algorithm Characteristics β
βββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ€
β Backtracking β DLX β
βββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββ€
β - Deep recursion β - Linked-list traversal β
β - Stack-heavy operations β - Pointer manipulation β
β - Predictable branches β - Cache-unfriendly access β
β - Cache-friendly (array) β - Complex data structures β
βββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββ€
β MSVC: Excellent tail-call β GCC: Superior pointer aliasing β
β optimization & stack mgmt β analysis & link-time optimization β
βββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Apple Silicon Architecture Advantages β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β β οΈ Note: macOS GitHub Runner runs M1 in VM (~30% overhead) β
β Real M1 estimated: ~45/s single-thread, ~114/s multi-thread β
β β
β 1. Unified Memory Architecture (UMA) β
β βββββββββββ βββββββββββ βββββββββββ β
β β P-Core β β E-Core β β GPU β β
β ββββββ¬βββββ ββββββ¬βββββ ββββββ¬βββββ β
β β β β β
β ββββββββββββββββΌβββββββββββββββ β
β β β
β ββββββββ΄βββββββ β
β β Unified RAM β << Zero-copy, no NUMA penalty β
β βββββββββββββββ β
β β
β 2. Low Core Count = Simple Scheduling β
β - 3 cores = minimal context switch overhead β
β - No cross-CCD latency (unlike AMD Ryzen) β
β - macOS scheduler highly optimized for Apple Silicon β
β β
β 3. High IPC (Instructions Per Clock) β
β - M1 IPC rivals desktop CPUs despite lower clock β
β - Wide execution units (8-wide decode) β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββ
β Unified Memory Architecture β
ββββββββββββββββββ¬βββββββββββββββββ€
β Advantages β Disadvantages β
ββββββββββββββββββΌβββββββββββββββββ€
β + No NUMA β - Shared β
β penalties β bandwidth β
β β β
β + Zero-copy β - CPU & GPU β
β GPU <-> CPU β compete β
β β β
β + Simplified β - Limited by β
β programming β single pool β
β β β
β + Lower power β - Can't add β
β consumption β more RAM β
ββββββββββββββββββ΄βββββββββββββββββ
Test Configuration: AMD Ryzen 9 9950X3D (32T)
Puzzle Complexity Comparison
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Size β Cells β Empty β Search Space β Recommended Runs β
βββββββββΌββββββββΌββββββββΌβββββββββββββββΌβββββββββββββββββββ€
9x9 β 81 β ~51 β ~10^20 β 5000-10000 β
16x16 β 256 β ~220 β ~10^80 β 1000-5000 β
25x25 β 625 β ~500 β ~10^200+ β 50-200 β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Algorithm | System | Throughput | Speedup | Efficiency |
|---|---|---|---|---|
| DLX | 9950X3D (32T) | 23,075 /s | 19.87x | 62.10% |
| DLX | 5800X (16T) | 7,558 /s | 9.16x | 57.27% |
| Backtracking | 9950X3D (32T) | 9,393 /s | 15.83x | 49.48% |
| Backtracking | 5800X (16T) | 2,731 /s | 7.03x | 43.96% |
25x25 Mega Puzzle - Extreme Benchmark
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
Algorithm β Throughput β Speedup β Efficiency
βββββββββββββββββΌβββββββββββββΌββββββββββΌβββββββββββ
Backtracking β 794/s β 14.95x β 46.71%
DLX β 877/s β 17.26x β 53.95% β
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
Note: DLX is 10% faster on 25x25 puzzles
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β When to Use Each Algorithm β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Use DLX when: β
β ββ Puzzle size β₯ 16x16 β
β ββ Multi-threaded execution β
β ββ Finding ALL solutions β
β ββ Running on Linux/GCC β
β ββ Consistency across platforms is important β
β β
β Use Backtracking when: β
β ββ Puzzle size = 9x9 β
β ββ Single-threaded execution β
β ββ Only need ONE solution β
β ββ Running on Windows/MSVC β
β ββ Memory efficiency is critical β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Platform | Solver | Benchmark | OCR | System Detection |
|---|---|---|---|---|
| Windows x64 | β | β | β | β Full |
| Linux x64 | β | β | β | β Full |
| macOS (Intel) | β | β | β | β Full |
| macOS (Apple Silicon) | β | β | β | β Full (ref clock) |
| Android ARM64 | β | β | β | β Full (SoC detection) |
- Windows: Best Backtracking performance due to MSVC optimizations
- Linux: Best DLX performance due to GCC pointer optimizations
- macOS: Excellent efficiency due to UMA and optimized scheduler
- Android: Full SoC detection (Snapdragon, Dimensity, Exynos, Tensor)
{
"puzzle": [
[5, 3, 0, 0, 7, 0, 0, 0, 0],
[6, 0, 0, 1, 9, 5, 0, 0, 0],
...
]
}{
"puzzle": "530070000600195000098000060800060003400803001700020006060000280000419005000080079"
}{
"name": "Easy 9x9 Puzzle",
"difficulty": "easy",
"size": 9,
"box_rows": 3,
"box_cols": 3,
"grid": [
[5, 3, 0, 0, 7, 0, 0, 0, 0],
...
]
}When metadata is present, it will be displayed:
Input Puzzle:
...
Name: Easy 9x9 Puzzle
Difficulty: easy
Size: 9x9
Empty cells: 51
sudoku-solver/
βββ include/
β βββ board.hpp # Board representation
β βββ solver.hpp # Solver interface
β βββ solver_dlx.hpp # DLX implementation
β βββ solver_backtrack.hpp # Backtracking implementation
β βββ ocr_processor.hpp # OCR processing (conditional)
β βββ benchmark.hpp # Benchmarking utilities
β βββ system_info.hpp # Cross-platform system detection
β βββ json_handler.hpp # JSON I/O
β βββ types.hpp # Common types
βββ src/
β βββ main.cpp # Entry point + built-in puzzles
β βββ board.cpp
β βββ solver_dlx.cpp
β βββ solver_backtrack.cpp
β βββ ocr_processor.cpp # Conditionally compiled
β βββ benchmark.cpp
β βββ system_info.cpp # Platform-specific detection
β βββ json_handler.cpp
βββ .github/
β βββ workflows/ # CI/CD pipelines
βββ CMakeLists.txt
βββ CMakePresets.json # Cross-platform build presets
βββ CMakeSettings.json # Visual Studio settings
βββ vcpkg.json # Dependencies
βββ README.md
Managed via vcpkg:
| Package | Purpose | Platforms |
|---|---|---|
| OpenCV | Image processing | All |
| Tesseract | OCR engine | Windows/Linux/macOS |
| nlohmann/json | JSON parsing | All |
| CLI11 | Command line parsing | All |
# List available presets
cmake --list-presets
# Configure and build
cmake --preset <preset-name>
cmake --build --preset <preset-name>cmake -B build -S . \
-DCMAKE_TOOLCHAIN_FILE=$VCPKG_ROOT/scripts/buildsystems/vcpkg.cmake \
-DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release| Option | Default | Description |
|---|---|---|
ENABLE_OCR |
ON | Enable Tesseract OCR support |
ENABLE_OPENMP |
OFF | Enable OpenMP parallelization |
MIT License
- AllenK
- Kwyshell
- Donald Knuth for the Dancing Links algorithm
- The Tesseract OCR project
- OpenCV community
- vcpkg maintainers