Skip to content

Tags: GreatV/oar-ocr

Tags

v0.6.3

Toggle v0.6.3's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
bump(package): update version to 0.6.3 (#110)

v0.6.2

Toggle v0.6.2's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix(structure): fix table cell matching, batch formula inference, and…

… improve markdown output (#100)

* fix(structure): fix table cell matching, batch formula inference, and improve markdown output

- Fix IoA space mismatch in wired table stitching by always using structure cell bboxes; add cross-row OCR deduplication for large cells
- Add predict_images() for cross-page formula batching into a single ONNX inference call, reducing overhead for multi-page documents
- Improve markdown: downgrade ABSTRACT/REFERENCES to h2, require text on both sides for inline formulas, add bullet list formatting, fix paragraph continuation across figures/tables
- Speed up formula preprocessing with bilinear resize (~4x faster)
- Remove premature dedup_by in cluster_positions to match PaddleX
- Use <br/> instead of space for multi-line OCR content in table cells

* fix(structure): per-page error handling, batch chunking, and markdown fixes

v0.6.1

Toggle v0.6.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
feat(vl): add PaddleOCR-VL-1.5 support with text spotting and seal re…

…cognition (#88)

* feat(vl): add PaddleOCR-VL-1.5 support with text spotting and seal recognition

- Add support for PaddleOCR-VL-1.5 model with new tasks: Spotting and Seal
- Update documentation to mention PaddleOCR-VL-1.5 support
- Change huggingface-cli to hf in download commands
- Fix clippy warnings: collapsible if statements, type complexity,
  abs_diff, repeat_n, and needless range loops
- Improve layout detection adapter with PaddleX merge modes

* refactor(layout): replace tuple type alias with named struct and fix row sort order

* fix: address code review feedback for layout and VL modules

* fix: use config score_threshold as fallback for missing class thresholds

v0.6.0

Toggle v0.6.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
feat: implement LightOnOCR model (#81)

* Implement LightOnOCR image processing, text model, and vision model

- Added `processing.rs` for image preprocessing logic, including resizing and normalization.
- Introduced `text.rs` for the LightOnOCR text model, implementing rotary embeddings, attention layers, and MLPs.
- Created `vision.rs` for the Pixtral Vision Model, including patch convolution and transformer layers with rotary embeddings.
- Enhanced utility functions in `utils.rs` to handle character-based substring matching for improved performance with non-ASCII characters.

* feat: Enhance LightOnOCR configuration and processing with error handling and optimizations

* fix: Improve numerical stability in dtype handling for LightOnOCR models

* fix: Improve numerical stability in dtype handling for F16 and BF16

v0.5.2

Toggle v0.5.2's commit message
refactor: Simplify path extraction in unclip function

v0.5.1

Toggle v0.5.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
chore: Add workspace configuration to Cargo.toml (#72)

* chore: Add workspace configuration to Cargo.toml

* refactor: Simplify conditional parsing for builder config attribute

* refactor: Update Cargo.toml to use workspace attributes for package metadata

v0.5.0

Toggle v0.5.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Add unirec-0.1b model (#60)

* Add utility functions for VL models and document parsing

- Implement error conversion functions for Candle to OCRError.
- Add tensor manipulation functions including rotation and concatenation.
- Create Markdown conversion functions for layout elements.
- Implement OTSL to HTML table conversion with support for various tags.
- Add image processing helpers for cropping margins from images.
- Introduce functions for calculating bounding box areas and overlap ratios.
- Implement detection and filtering of overlapping boxes in layout detection results.
- Add tests for utility functions to ensure correctness.

* refactor: Update usage documentation and examples for improved clarity and consistency

* fix: Remove unnecessary reference for ignore_labels in markdown conversion

* refactor: Optimize sinusoidal embedding generation for GPU efficiency

v0.4.0

Toggle v0.4.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Add PaddleOCR-VL document parsing and image processing modules (#52)

* Add PaddleOCR-VL document parsing and image processing modules

- Implemented `PaddleOcrVlDocParser` for layout-first document parsing using PP-DocLayoutV2 and PaddleOCR-VL.
- Introduced configuration struct `PaddleOcrVlDocParserConfig` to customize parsing behavior.
- Added functions for bounding box manipulation, layout element sorting, and order assignment.
- Created `PaddleOcrVlImageInputs` and `preprocess_images` function for image preprocessing, including smart resizing and normalization.
- Implemented table output post-processing with HTML conversion and token parsing.
- Added unit tests for smart resizing and image preprocessing outputs.

* bump version to 0.3.2

* Enhance layout detection and processing for PaddleOCR-VL

- Update output handling to support reading order for PP-DocLayoutV2.
- Introduce `is_reading_order_sorted` flag in layout detection output.
- Refactor document parsing to utilize reading order when available.
- Add new configuration parameters for image processing.
- Improve position embedding interpolation in vision model.

* Refactor layout post-processing types and update Clippy linter configuration

* Refactor error handling and improve layout post-processing in PaddleOCR-VL

* Bump version to 0.4.0 and update HTML entity comment in image processing

v0.3.1

Toggle v0.3.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
chore: Update version to 0.3.1 and refine tokenizers dependency confi…

…guration (#51)

v0.3.0

Toggle v0.3.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Add documentation for pre-trained models and usage guide (#48)

* Add documentation for pre-trained models and usage guide

- Created models.md to document available pre-trained models for OCR and document understanding, including details on text detection, recognition, character dictionaries, preprocessing, and document structure models.
- Added usage.md to provide a comprehensive guide on using OAROCR, covering basic OCR pipeline, batch processing, builder APIs for text recognition and document structure analysis, GPU acceleration, and configuration options.

* refactor: Update model file paths in usage documentation for consistency