OAR (ONNXRuntime And Rust) OCR

A comprehensive OCR (Optical Character Recognition) and document understanding library, built in Rust with ONNX Runtime for efficient inference.

Features

End-to-end OCR pipeline (text detection → text recognition)
Optional preprocessing: document orientation, text-line orientation, UVDoc rectification
Document structure analysis (PP-StructureV3-style): layout, regions, tables, formulas, seals
Typed configs for each task/model (serde-friendly)
ONNX Runtime execution providers (CPU by default; CUDA/TensorRT/DirectML/CoreML/OpenVINO/WebGPU via features)
Optional visualization helpers (feature visualization)

Quick Start

Installation

Add OAROCR to your project's Cargo.toml:

cargo add oar-ocr

Enable ONNX Runtime execution providers via crate features:

cuda, tensorrt, directml, coreml, openvino, webgpu

For example, for CUDA support:

cargo add oar-ocr --features cuda

For visualization utilities (used by examples):

cargo add oar-ocr --features visualization

Or manually add it to your Cargo.toml:

[dependencies]
oar-ocr = "0.3"

# Example: CUDA + visualization
oar-ocr = { version = "0.3", features = ["cuda", "visualization"] }

# Other execution providers:
# oar-ocr = { version = "0.3", features = ["tensorrt"] }
# oar-ocr = { version = "0.3", features = ["directml"] }
# oar-ocr = { version = "0.3", features = ["coreml"] }
# oar-ocr = { version = "0.3", features = ["openvino"] }
# oar-ocr = { version = "0.3", features = ["webgpu"] }

Basic Usage

Here's a simple example of how to use OAROCR to extract text from an image:

use oar_ocr::prelude::*;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Build OCR pipeline with required models
    let ocr = OAROCRBuilder::new(
        "detection_model.onnx",
        "recognition_model.onnx",
        "char_dict.txt",
    )
    .build()?;

    // Process a single image
    let image = load_image("document.jpg")?;
    let results = ocr.predict(vec![image])?;
    let result = &results[0];

    // Print extracted text with confidence scores using the modern TextRegion API
    for text_region in &result.text_regions {
        if let Some((text, confidence)) = text_region.text_with_confidence() {
            println!("Text: {} (confidence: {:.2})", text, confidence);
        }
    }

    // Process multiple images at once
    let images = load_images(&[
        "document1.jpg",
        "document2.jpg",
        "document3.jpg",
    ])?;
    let results = ocr.predict(images)?;

    for result in results {
        println!("Image {}: {} text regions found", result.index, result.text_regions.len());
        for text_region in &result.text_regions {
            if let Some((text, confidence)) = text_region.text_with_confidence() {
                println!("  Text: {} (confidence: {:.2})", text, confidence);
            }
        }
    }

    Ok(())
}

This example creates an OCR pipeline using pre-trained models for text detection and recognition. The pipeline processes the input image and returns structured TextRegion objects containing the recognized text, confidence scores, and bounding boxes for each detected text region.

High-Level Builder APIs

OAROCR provides two high-level builder APIs for easy pipeline construction:

OAROCRBuilder - Text Recognition Pipeline

The OAROCRBuilder provides a fluent API for building OCR pipelines with optional components:

use oar_ocr::oarocr::OAROCRBuilder;

// Basic OCR pipeline
let ocr = OAROCRBuilder::new(
    "models/det.onnx",
    "models/rec.onnx",
    "models/dict.txt"
)
.build()?;

// OCR with optional components
let ocr = OAROCRBuilder::new(
    "models/det.onnx",
    "models/rec.onnx",
    "models/dict.txt"
)
.with_document_image_orientation_classification("models/doc_orient.onnx")
.with_text_line_orientation_classification("models/line_orient.onnx")
.with_document_image_rectification("models/rectify.onnx")
.image_batch_size(4)
.region_batch_size(64)
.build()?;

Useful options:

.text_type("seal") - optimized pipeline defaults for curved seal/stamp text
.return_word_box(true) - enable word-level boxes from recognition output

OARStructureBuilder - Document Structure Analysis

The OARStructureBuilder enables document structure analysis with layout detection, table recognition, and formula extraction:

use oar_ocr::oarocr::OARStructureBuilder;

// Basic layout detection
let structure = OARStructureBuilder::new("models/layout.onnx")
    .build()?;

// Full document structure analysis with table and formula recognition
let structure = OARStructureBuilder::new("models/layout.onnx")
    .with_table_classification("models/table_cls.onnx")
    .with_table_cell_detection("models/table_cell.onnx", "wired")
    .with_table_structure_recognition("models/table_struct.onnx", "wired")
    .table_structure_dict_path("models/table_structure_dict_ch.txt")
    .with_formula_recognition("models/formula.onnx", "models/tokenizer.json", "pp_formulanet")
    .build()?;

// Structure analysis with integrated OCR
let structure = OARStructureBuilder::new("models/layout.onnx")
    .with_table_classification("models/table_cls.onnx")
    .with_ocr("models/det.onnx", "models/rec.onnx", "models/dict.txt")
    .build()?;

Both builders support:

Configuration: Set task configs via typed structs (serde-friendly)
Batch/Concurrency: Tune session pools via image_batch_size / region_batch_size
ONNX Runtime Settings: Apply a shared OrtSessionConfig via .ort_session(...)
Validation: Automatic validation with detailed errors

Examples

This repository includes runnable CLI examples under examples/ (they require model files). Use --help to see all options:

cargo run --example ocr -- --help
cargo run --example structure -- --help

Using CUDA for GPU Acceleration

For better performance, you can enable CUDA support to run inference on GPU:

use oar_ocr::prelude::*;
use oar_ocr::core::config::{OrtSessionConfig, OrtExecutionProvider};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Configure CUDA execution provider for GPU acceleration
    let ort_config = OrtSessionConfig::new()
        .with_execution_providers(vec![
            OrtExecutionProvider::CUDA {
                device_id: Some(0),  // Use GPU 0
                gpu_mem_limit: None,
                arena_extend_strategy: None,
                cudnn_conv_algo_search: None,
                do_copy_in_default_stream: None,
                cudnn_conv_use_max_workspace: None,
            },
            OrtExecutionProvider::CPU,  // Fallback to CPU if CUDA fails
        ]);

    // Build OCR pipeline with CUDA support
    let ocr = OAROCRBuilder::new(
        "detection_model.onnx",
        "recognition_model.onnx",
        "char_dict.txt",
    )
    .ort_session(ort_config)  // Apply ORT config to all components
    .build()?;

    // Process images (same as CPU example)
    let image = load_image("document.jpg")?;
    let results = ocr.predict(vec![image])?;
    let result = &results[0];

    // Extract text from results
    for text_region in &result.text_regions {
        if let Some((text, confidence)) = text_region.text_with_confidence() {
            println!("Text: {} (confidence: {:.2})", text, confidence);
        }
    }

    Ok(())
}

Note: To use CUDA support, you need to:

Install oar-ocr with CUDA feature: cargo add oar-ocr --features cuda
Have CUDA toolkit and cuDNN installed on your system
Ensure your ONNX models are compatible with CUDA execution
(Optional) Use other execution providers via tensorrt, directml, coreml, openvino, webgpu features

Pre-trained Models

OAROCR provides several pre-trained models for different OCR tasks. Download them from the GitHub Releases page.

Text Detection Models

Choose between mobile and server variants based on your needs:

Mobile: Smaller, faster models suitable for real-time applications
Server: Larger, more accurate models for high-precision requirements

Model Type	Version	Category	Model File	Size	Description
Text Detection	PP-OCRv4	Mobile	`ppocrv4_mobile_det.onnx`	4.8MB	Mobile variant for real-time applications
Text Detection	PP-OCRv4	Server	`ppocrv4_server_det.onnx`	113.2MB	Server variant for high-precision requirements
Text Detection	PP-OCRv5	Mobile	`ppocrv5_mobile_det.onnx`	4.8MB	Mobile variant for real-time applications
Text Detection	PP-OCRv5	Server	`ppocrv5_server_det.onnx`	87.7MB	Server variant for high-precision requirements

Text Recognition Models

Recognition models are available in multiple versions and languages:

Chinese/General Models

Model Type	Version	Language/Category	Model File	Size	Description
Text Recognition	PP-OCRv4	Chinese/General	`ppocrv4_mobile_rec.onnx`	10.8MB	Mobile variant
Text Recognition	PP-OCRv4	Chinese/General	`ppocrv4_server_rec.onnx`	90.4MB	Server variant
Text Recognition	PP-OCRv4	Chinese/General	`ppocrv4_server_rec_doc.onnx`	94.7MB	Server variant for document text
Text Recognition	PP-OCRv5	Chinese/General	`ppocrv5_mobile_rec.onnx`	16.5MB	Mobile variant
Text Recognition	PP-OCRv5	Chinese/General	`ppocrv5_server_rec.onnx`	84.1MB	Server variant

Language-Specific Models

Model Type	Version	Language	Model File	Size	Description
Text Recognition	PP-OCRv4	English	`en_ppocrv4_mobile_rec.onnx`	7.7MB	Language-specific model
Text Recognition	PP-OCRv5	Eastern Slavic	`eslav_ppocrv5_mobile_rec.onnx`	7.9MB	Language-specific model
Text Recognition	PP-OCRv5	Korean	`korean_ppocrv5_mobile_rec.onnx`	13.4MB	Language-specific model
Text Recognition	PP-OCRv5	Latin	`latin_ppocrv5_mobile_rec.onnx`	7.9MB	Language-specific model

Character Dictionaries

Character dictionaries are required for text recognition models. Choose the appropriate dictionary for your models:

General Dictionaries

File Type	Version	Category	Model File	Size	Description
Character Dictionary	PP-OCRv4	Document	`ppocrv4_doc_dict.txt`	-	For PP-OCRv4 document models
Character Dictionary	PP-OCRv5	General	`ppocrv5_dict.txt`	-	For PP-OCRv5 models
Character Dictionary	PP-OCR Keys v1	General	`ppocr_keys_v1.txt`	-	For older PP-OCR models

Language-Specific Dictionaries

File Type	Version	Language	Model File	Size	Description
Character Dictionary	PP-OCRv4	English	`en_dict.txt`	-	For PP-OCRv4 English recognition models
Character Dictionary	PP-OCRv5	Eastern Slavic	`ppocrv5_eslav_dict.txt`	-	For PP-OCRv5 Eastern Slavic models
Character Dictionary	PP-OCRv5	Korean	`ppocrv5_korean_dict.txt`	-	For PP-OCRv5 Korean models
Character Dictionary	PP-OCRv5	Latin	`ppocrv5_latin_dict.txt`	-	For PP-OCRv5 Latin script models

Optional Models

These models provide additional functionality for specialized use cases:

Model Type	Version	Category	Model File	Size	Description
Document Orientation	PPLCNet	-	`pplcnet_x1_0_doc_ori.onnx`	6.7MB	Detect document rotation
Text Line Orientation	PPLCNet	Light	`pplcnet_x0_25_textline_ori.onnx`	988KB	Detect text line orientation
Text Line Orientation	PPLCNet	Standard	`pplcnet_x1_0_textline_ori.onnx`	6.7MB	Detect text line orientation
Document Rectification	UVDoc	-	`uvdoc.onnx`	31.6MB	Fix perspective distortion

Document Structure Models

These models are typically used with OARStructureBuilder (layout, tables, formulas, seals). File names below match the presets used by the builders and examples; download them from the Releases page as needed.

Component	Suggested Model File(s)	Notes
Layout Detection	`pp-doclayout_plus-l.onnx`	PP-DocLayout_plus-L (default preset)
Region Detection	`pp-docblocklayout.onnx`	PP-DocBlockLayout, for hierarchical ordering
Table Classification	`pp-lcnet_x1_0_table_cls.onnx`	Wired vs wireless table type
Table Cell Detection	`rt-detr-l_wired_table_cell_det.onnx`	Wired tables (RT-DETR)
Table Cell Detection	`rt-detr-l_wireless_table_cell_det.onnx`	Wireless tables (RT-DETR)
Table Structure Recognition	`slanext_wired.onnx`, `slanet_plus.onnx`	Wired / wireless structure recognition
Table Structure Dictionary	`table_structure_dict_ch.txt`	Required when enabling table structure recognition
Formula Recognition	`pp-formulanet_plus-l.onnx`, `unimernet.onnx`	`with_formula_recognition(..., tokenizer.json, model_type)`
Formula Tokenizer	`unimernet_tokenizer.json`	Must match the selected formula model
Seal Text Detection	`pp-ocrv4_server_seal_det.onnx`	Seal/stamp text detection

Acknowledgments

This project builds upon the excellent work of several open-source projects:

ort: Rust bindings for ONNX Runtime by pykeio. This crate provides the Rust interface to ONNX Runtime that powers the efficient inference engine in this OCR library.
PaddleOCR: Baidu's awesome multilingual OCR toolkits based on PaddlePaddle. This project utilizes PaddleOCR's pre-trained models, which provide excellent accuracy and performance for text detection and recognition across multiple languages.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github		.github
examples		examples
src		src
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OAR (ONNXRuntime And Rust) OCR

Features

Quick Start

Installation

Basic Usage

High-Level Builder APIs

OAROCRBuilder - Text Recognition Pipeline

OARStructureBuilder - Document Structure Analysis

Examples

Using CUDA for GPU Acceleration

Pre-trained Models

Text Detection Models

Text Recognition Models

Chinese/General Models

Language-Specific Models

Character Dictionaries

General Dictionaries

Language-Specific Dictionaries

Optional Models

Document Structure Models

Acknowledgments

About

Uh oh!

Releases 5

Uh oh!

Contributors 6

Uh oh!

Languages

License

GreatV/oar-ocr

Folders and files

Latest commit

History

Repository files navigation

OAR (ONNXRuntime And Rust) OCR

Features

Quick Start

Installation

Basic Usage

High-Level Builder APIs

OAROCRBuilder - Text Recognition Pipeline

OARStructureBuilder - Document Structure Analysis

Examples

Using CUDA for GPU Acceleration

Pre-trained Models

Text Detection Models

Text Recognition Models

Chinese/General Models

Language-Specific Models

Character Dictionaries

General Dictionaries

Language-Specific Dictionaries

Optional Models

Document Structure Models

Acknowledgments

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases 5

Uh oh!

Contributors 6

Uh oh!

Languages