A comprehensive OCR (Optical Character Recognition) and document understanding library, built in Rust with ONNX Runtime for efficient inference.
- End-to-end OCR pipeline (text detection → text recognition)
- Optional preprocessing: document orientation, text-line orientation, UVDoc rectification
- Document structure analysis (PP-StructureV3-style): layout, regions, tables, formulas, seals
- Typed configs for each task/model (serde-friendly)
- ONNX Runtime execution providers (CPU by default; CUDA/TensorRT/DirectML/CoreML/OpenVINO/WebGPU via features)
- Optional visualization helpers (feature
visualization)
Add OAROCR to your project's Cargo.toml:
cargo add oar-ocrEnable ONNX Runtime execution providers via crate features:
cuda,tensorrt,directml,coreml,openvino,webgpu
For example, for CUDA support:
cargo add oar-ocr --features cudaFor visualization utilities (used by examples):
cargo add oar-ocr --features visualizationOr manually add it to your Cargo.toml:
[dependencies]
oar-ocr = "0.3"
# Example: CUDA + visualization
oar-ocr = { version = "0.3", features = ["cuda", "visualization"] }
# Other execution providers:
# oar-ocr = { version = "0.3", features = ["tensorrt"] }
# oar-ocr = { version = "0.3", features = ["directml"] }
# oar-ocr = { version = "0.3", features = ["coreml"] }
# oar-ocr = { version = "0.3", features = ["openvino"] }
# oar-ocr = { version = "0.3", features = ["webgpu"] }Here's a simple example of how to use OAROCR to extract text from an image:
use oar_ocr::prelude::*;
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Build OCR pipeline with required models
let ocr = OAROCRBuilder::new(
"detection_model.onnx",
"recognition_model.onnx",
"char_dict.txt",
)
.build()?;
// Process a single image
let image = load_image("document.jpg")?;
let results = ocr.predict(vec![image])?;
let result = &results[0];
// Print extracted text with confidence scores using the modern TextRegion API
for text_region in &result.text_regions {
if let Some((text, confidence)) = text_region.text_with_confidence() {
println!("Text: {} (confidence: {:.2})", text, confidence);
}
}
// Process multiple images at once
let images = load_images(&[
"document1.jpg",
"document2.jpg",
"document3.jpg",
])?;
let results = ocr.predict(images)?;
for result in results {
println!("Image {}: {} text regions found", result.index, result.text_regions.len());
for text_region in &result.text_regions {
if let Some((text, confidence)) = text_region.text_with_confidence() {
println!(" Text: {} (confidence: {:.2})", text, confidence);
}
}
}
Ok(())
}This example creates an OCR pipeline using pre-trained models for text detection and recognition. The pipeline processes the input image and returns structured TextRegion objects containing the recognized text, confidence scores, and bounding boxes for each detected text region.
OAROCR provides two high-level builder APIs for easy pipeline construction:
The OAROCRBuilder provides a fluent API for building OCR pipelines with optional components:
use oar_ocr::oarocr::OAROCRBuilder;
// Basic OCR pipeline
let ocr = OAROCRBuilder::new(
"models/det.onnx",
"models/rec.onnx",
"models/dict.txt"
)
.build()?;
// OCR with optional components
let ocr = OAROCRBuilder::new(
"models/det.onnx",
"models/rec.onnx",
"models/dict.txt"
)
.with_document_image_orientation_classification("models/doc_orient.onnx")
.with_text_line_orientation_classification("models/line_orient.onnx")
.with_document_image_rectification("models/rectify.onnx")
.image_batch_size(4)
.region_batch_size(64)
.build()?;Useful options:
.text_type("seal")- optimized pipeline defaults for curved seal/stamp text.return_word_box(true)- enable word-level boxes from recognition output
The OARStructureBuilder enables document structure analysis with layout detection, table recognition, and formula extraction:
use oar_ocr::oarocr::OARStructureBuilder;
// Basic layout detection
let structure = OARStructureBuilder::new("models/layout.onnx")
.build()?;
// Full document structure analysis with table and formula recognition
let structure = OARStructureBuilder::new("models/layout.onnx")
.with_table_classification("models/table_cls.onnx")
.with_table_cell_detection("models/table_cell.onnx", "wired")
.with_table_structure_recognition("models/table_struct.onnx", "wired")
.table_structure_dict_path("models/table_structure_dict_ch.txt")
.with_formula_recognition("models/formula.onnx", "models/tokenizer.json", "pp_formulanet")
.build()?;
// Structure analysis with integrated OCR
let structure = OARStructureBuilder::new("models/layout.onnx")
.with_table_classification("models/table_cls.onnx")
.with_ocr("models/det.onnx", "models/rec.onnx", "models/dict.txt")
.build()?;Both builders support:
- Configuration: Set task configs via typed structs (serde-friendly)
- Batch/Concurrency: Tune session pools via
image_batch_size/region_batch_size - ONNX Runtime Settings: Apply a shared
OrtSessionConfigvia.ort_session(...) - Validation: Automatic validation with detailed errors
This repository includes runnable CLI examples under examples/ (they require model files). Use --help to see all options:
cargo run --example ocr -- --help
cargo run --example structure -- --helpFor better performance, you can enable CUDA support to run inference on GPU:
use oar_ocr::prelude::*;
use oar_ocr::core::config::{OrtSessionConfig, OrtExecutionProvider};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Configure CUDA execution provider for GPU acceleration
let ort_config = OrtSessionConfig::new()
.with_execution_providers(vec![
OrtExecutionProvider::CUDA {
device_id: Some(0), // Use GPU 0
gpu_mem_limit: None,
arena_extend_strategy: None,
cudnn_conv_algo_search: None,
do_copy_in_default_stream: None,
cudnn_conv_use_max_workspace: None,
},
OrtExecutionProvider::CPU, // Fallback to CPU if CUDA fails
]);
// Build OCR pipeline with CUDA support
let ocr = OAROCRBuilder::new(
"detection_model.onnx",
"recognition_model.onnx",
"char_dict.txt",
)
.ort_session(ort_config) // Apply ORT config to all components
.build()?;
// Process images (same as CPU example)
let image = load_image("document.jpg")?;
let results = ocr.predict(vec![image])?;
let result = &results[0];
// Extract text from results
for text_region in &result.text_regions {
if let Some((text, confidence)) = text_region.text_with_confidence() {
println!("Text: {} (confidence: {:.2})", text, confidence);
}
}
Ok(())
}Note: To use CUDA support, you need to:
- Install oar-ocr with CUDA feature:
cargo add oar-ocr --features cuda - Have CUDA toolkit and cuDNN installed on your system
- Ensure your ONNX models are compatible with CUDA execution
- (Optional) Use other execution providers via
tensorrt,directml,coreml,openvino,webgpufeatures
OAROCR provides several pre-trained models for different OCR tasks. Download them from the GitHub Releases page.
Choose between mobile and server variants based on your needs:
- Mobile: Smaller, faster models suitable for real-time applications
- Server: Larger, more accurate models for high-precision requirements
| Model Type | Version | Category | Model File | Size | Description |
|---|---|---|---|---|---|
| Text Detection | PP-OCRv4 | Mobile | ppocrv4_mobile_det.onnx |
4.8MB | Mobile variant for real-time applications |
| Text Detection | PP-OCRv4 | Server | ppocrv4_server_det.onnx |
113.2MB | Server variant for high-precision requirements |
| Text Detection | PP-OCRv5 | Mobile | ppocrv5_mobile_det.onnx |
4.8MB | Mobile variant for real-time applications |
| Text Detection | PP-OCRv5 | Server | ppocrv5_server_det.onnx |
87.7MB | Server variant for high-precision requirements |
Recognition models are available in multiple versions and languages:
| Model Type | Version | Language/Category | Model File | Size | Description |
|---|---|---|---|---|---|
| Text Recognition | PP-OCRv4 | Chinese/General | ppocrv4_mobile_rec.onnx |
10.8MB | Mobile variant |
| Text Recognition | PP-OCRv4 | Chinese/General | ppocrv4_server_rec.onnx |
90.4MB | Server variant |
| Text Recognition | PP-OCRv4 | Chinese/General | ppocrv4_server_rec_doc.onnx |
94.7MB | Server variant for document text |
| Text Recognition | PP-OCRv5 | Chinese/General | ppocrv5_mobile_rec.onnx |
16.5MB | Mobile variant |
| Text Recognition | PP-OCRv5 | Chinese/General | ppocrv5_server_rec.onnx |
84.1MB | Server variant |
| Model Type | Version | Language | Model File | Size | Description |
|---|---|---|---|---|---|
| Text Recognition | PP-OCRv4 | English | en_ppocrv4_mobile_rec.onnx |
7.7MB | Language-specific model |
| Text Recognition | PP-OCRv5 | Eastern Slavic | eslav_ppocrv5_mobile_rec.onnx |
7.9MB | Language-specific model |
| Text Recognition | PP-OCRv5 | Korean | korean_ppocrv5_mobile_rec.onnx |
13.4MB | Language-specific model |
| Text Recognition | PP-OCRv5 | Latin | latin_ppocrv5_mobile_rec.onnx |
7.9MB | Language-specific model |
Character dictionaries are required for text recognition models. Choose the appropriate dictionary for your models:
| File Type | Version | Category | Model File | Size | Description |
|---|---|---|---|---|---|
| Character Dictionary | PP-OCRv4 | Document | ppocrv4_doc_dict.txt |
- | For PP-OCRv4 document models |
| Character Dictionary | PP-OCRv5 | General | ppocrv5_dict.txt |
- | For PP-OCRv5 models |
| Character Dictionary | PP-OCR Keys v1 | General | ppocr_keys_v1.txt |
- | For older PP-OCR models |
| File Type | Version | Language | Model File | Size | Description |
|---|---|---|---|---|---|
| Character Dictionary | PP-OCRv4 | English | en_dict.txt |
- | For PP-OCRv4 English recognition models |
| Character Dictionary | PP-OCRv5 | Eastern Slavic | ppocrv5_eslav_dict.txt |
- | For PP-OCRv5 Eastern Slavic models |
| Character Dictionary | PP-OCRv5 | Korean | ppocrv5_korean_dict.txt |
- | For PP-OCRv5 Korean models |
| Character Dictionary | PP-OCRv5 | Latin | ppocrv5_latin_dict.txt |
- | For PP-OCRv5 Latin script models |
These models provide additional functionality for specialized use cases:
| Model Type | Version | Category | Model File | Size | Description |
|---|---|---|---|---|---|
| Document Orientation | PPLCNet | - | pplcnet_x1_0_doc_ori.onnx |
6.7MB | Detect document rotation |
| Text Line Orientation | PPLCNet | Light | pplcnet_x0_25_textline_ori.onnx |
988KB | Detect text line orientation |
| Text Line Orientation | PPLCNet | Standard | pplcnet_x1_0_textline_ori.onnx |
6.7MB | Detect text line orientation |
| Document Rectification | UVDoc | - | uvdoc.onnx |
31.6MB | Fix perspective distortion |
These models are typically used with OARStructureBuilder (layout, tables, formulas, seals). File names below match the presets used by the builders and examples; download them from the Releases page as needed.
| Component | Suggested Model File(s) | Notes |
|---|---|---|
| Layout Detection | pp-doclayout_plus-l.onnx |
PP-DocLayout_plus-L (default preset) |
| Region Detection | pp-docblocklayout.onnx |
PP-DocBlockLayout, for hierarchical ordering |
| Table Classification | pp-lcnet_x1_0_table_cls.onnx |
Wired vs wireless table type |
| Table Cell Detection | rt-detr-l_wired_table_cell_det.onnx |
Wired tables (RT-DETR) |
| Table Cell Detection | rt-detr-l_wireless_table_cell_det.onnx |
Wireless tables (RT-DETR) |
| Table Structure Recognition | slanext_wired.onnx, slanet_plus.onnx |
Wired / wireless structure recognition |
| Table Structure Dictionary | table_structure_dict_ch.txt |
Required when enabling table structure recognition |
| Formula Recognition | pp-formulanet_plus-l.onnx, unimernet.onnx |
with_formula_recognition(..., tokenizer.json, model_type) |
| Formula Tokenizer | unimernet_tokenizer.json |
Must match the selected formula model |
| Seal Text Detection | pp-ocrv4_server_seal_det.onnx |
Seal/stamp text detection |
This project builds upon the excellent work of several open-source projects:
-
ort: Rust bindings for ONNX Runtime by pykeio. This crate provides the Rust interface to ONNX Runtime that powers the efficient inference engine in this OCR library.
-
PaddleOCR: Baidu's awesome multilingual OCR toolkits based on PaddlePaddle. This project utilizes PaddleOCR's pre-trained models, which provide excellent accuracy and performance for text detection and recognition across multiple languages.