Tags · GreatV/oar-ocr

v0.6.3

bump(package): update version to 0.6.3 (#110)

Apr 15, 2026
b04ab0d
zip
tar.gz
Notes

v0.6.2

fix(structure): fix table cell matching, batch formula inference, and…

… improve markdown output (#100)

* fix(structure): fix table cell matching, batch formula inference, and improve markdown output

- Fix IoA space mismatch in wired table stitching by always using structure cell bboxes; add cross-row OCR deduplication for large cells
- Add predict_images() for cross-page formula batching into a single ONNX inference call, reducing overhead for multi-page documents
- Improve markdown: downgrade ABSTRACT/REFERENCES to h2, require text on both sides for inline formulas, add bullet list formatting, fix paragraph continuation across figures/tables
- Speed up formula preprocessing with bilinear resize (~4x faster)
- Remove premature dedup_by in cluster_positions to match PaddleX
- Use <br/> instead of space for multi-line OCR content in table cells

* fix(structure): per-page error handling, batch chunking, and markdown fixes

Mar 8, 2026
66ee9c0
zip
tar.gz
Notes

v0.6.1

feat(vl): add PaddleOCR-VL-1.5 support with text spotting and seal re…

…cognition (#88)

* feat(vl): add PaddleOCR-VL-1.5 support with text spotting and seal recognition

- Add support for PaddleOCR-VL-1.5 model with new tasks: Spotting and Seal
- Update documentation to mention PaddleOCR-VL-1.5 support
- Change huggingface-cli to hf in download commands
- Fix clippy warnings: collapsible if statements, type complexity,
  abs_diff, repeat_n, and needless range loops
- Improve layout detection adapter with PaddleX merge modes

* refactor(layout): replace tuple type alias with named struct and fix row sort order

* fix: address code review feedback for layout and VL modules

* fix: use config score_threshold as fallback for missing class thresholds

Feb 3, 2026
1834e11
zip
tar.gz
Notes

v0.6.0

feat: implement LightOnOCR model (#81)

* Implement LightOnOCR image processing, text model, and vision model

- Added `processing.rs` for image preprocessing logic, including resizing and normalization.
- Introduced `text.rs` for the LightOnOCR text model, implementing rotary embeddings, attention layers, and MLPs.
- Created `vision.rs` for the Pixtral Vision Model, including patch convolution and transformer layers with rotary embeddings.
- Enhanced utility functions in `utils.rs` to handle character-based substring matching for improved performance with non-ASCII characters.

* feat: Enhance LightOnOCR configuration and processing with error handling and optimizations

* fix: Improve numerical stability in dtype handling for LightOnOCR models

* fix: Improve numerical stability in dtype handling for F16 and BF16

Jan 25, 2026
f5fc326
zip
tar.gz
Notes
Downloads

v0.5.2

refactor: Simplify path extraction in unclip function

Jan 8, 2026
47c7d82
zip
tar.gz
Notes

v0.5.1

chore: Add workspace configuration to Cargo.toml (#72)

* chore: Add workspace configuration to Cargo.toml

* refactor: Simplify conditional parsing for builder config attribute

* refactor: Update Cargo.toml to use workspace attributes for package metadata

Jan 5, 2026
7c2da1e
zip
tar.gz
Notes

v0.5.0

Add unirec-0.1b model (#60)

* Add utility functions for VL models and document parsing

- Implement error conversion functions for Candle to OCRError.
- Add tensor manipulation functions including rotation and concatenation.
- Create Markdown conversion functions for layout elements.
- Implement OTSL to HTML table conversion with support for various tags.
- Add image processing helpers for cropping margins from images.
- Introduce functions for calculating bounding box areas and overlap ratios.
- Implement detection and filtering of overlapping boxes in layout detection results.
- Add tests for utility functions to ensure correctness.

* refactor: Update usage documentation and examples for improved clarity and consistency

* fix: Remove unnecessary reference for ignore_labels in markdown conversion

* refactor: Optimize sinusoidal embedding generation for GPU efficiency

Jan 1, 2026
b09c1c4
zip
tar.gz
Notes

v0.4.0

Add PaddleOCR-VL document parsing and image processing modules (#52)

* Add PaddleOCR-VL document parsing and image processing modules

- Implemented `PaddleOcrVlDocParser` for layout-first document parsing using PP-DocLayoutV2 and PaddleOCR-VL.
- Introduced configuration struct `PaddleOcrVlDocParserConfig` to customize parsing behavior.
- Added functions for bounding box manipulation, layout element sorting, and order assignment.
- Created `PaddleOcrVlImageInputs` and `preprocess_images` function for image preprocessing, including smart resizing and normalization.
- Implemented table output post-processing with HTML conversion and token parsing.
- Added unit tests for smart resizing and image preprocessing outputs.

* bump version to 0.3.2

* Enhance layout detection and processing for PaddleOCR-VL

- Update output handling to support reading order for PP-DocLayoutV2.
- Introduce `is_reading_order_sorted` flag in layout detection output.
- Refactor document parsing to utilize reading order when available.
- Add new configuration parameters for image processing.
- Improve position embedding interpolation in vision model.

* Refactor layout post-processing types and update Clippy linter configuration

* Refactor error handling and improve layout post-processing in PaddleOCR-VL

* Bump version to 0.4.0 and update HTML entity comment in image processing

Dec 26, 2025
31bbf53
zip
tar.gz
Notes

v0.3.1

chore: Update version to 0.3.1 and refine tokenizers dependency confi…

…guration (#51)

Dec 25, 2025
1d4d197
zip
tar.gz
Notes

v0.3.0

Add documentation for pre-trained models and usage guide (#48)

* Add documentation for pre-trained models and usage guide

- Created models.md to document available pre-trained models for OCR and document understanding, including details on text detection, recognition, character dictionaries, preprocessing, and document structure models.
- Added usage.md to provide a comprehensive guide on using OAROCR, covering basic OCR pipeline, batch processing, builder APIs for text recognition and document structure analysis, GPU acceleration, and configuration options.

* refactor: Update model file paths in usage documentation for consistency

Dec 24, 2025
7a0ea9c
zip
tar.gz
Notes
Downloads

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.6.3

v0.6.2

v0.6.1

v0.6.0

v0.5.2

v0.5.1

v0.5.0

v0.4.0

v0.3.1

v0.3.0

Tags: GreatV/oar-ocr