Tags: GreatV/oar-ocr
Tags
fix(structure): fix table cell matching, batch formula inference, and… … improve markdown output (#100) * fix(structure): fix table cell matching, batch formula inference, and improve markdown output - Fix IoA space mismatch in wired table stitching by always using structure cell bboxes; add cross-row OCR deduplication for large cells - Add predict_images() for cross-page formula batching into a single ONNX inference call, reducing overhead for multi-page documents - Improve markdown: downgrade ABSTRACT/REFERENCES to h2, require text on both sides for inline formulas, add bullet list formatting, fix paragraph continuation across figures/tables - Speed up formula preprocessing with bilinear resize (~4x faster) - Remove premature dedup_by in cluster_positions to match PaddleX - Use <br/> instead of space for multi-line OCR content in table cells * fix(structure): per-page error handling, batch chunking, and markdown fixes
feat(vl): add PaddleOCR-VL-1.5 support with text spotting and seal re… …cognition (#88) * feat(vl): add PaddleOCR-VL-1.5 support with text spotting and seal recognition - Add support for PaddleOCR-VL-1.5 model with new tasks: Spotting and Seal - Update documentation to mention PaddleOCR-VL-1.5 support - Change huggingface-cli to hf in download commands - Fix clippy warnings: collapsible if statements, type complexity, abs_diff, repeat_n, and needless range loops - Improve layout detection adapter with PaddleX merge modes * refactor(layout): replace tuple type alias with named struct and fix row sort order * fix: address code review feedback for layout and VL modules * fix: use config score_threshold as fallback for missing class thresholds
feat: implement LightOnOCR model (#81) * Implement LightOnOCR image processing, text model, and vision model - Added `processing.rs` for image preprocessing logic, including resizing and normalization. - Introduced `text.rs` for the LightOnOCR text model, implementing rotary embeddings, attention layers, and MLPs. - Created `vision.rs` for the Pixtral Vision Model, including patch convolution and transformer layers with rotary embeddings. - Enhanced utility functions in `utils.rs` to handle character-based substring matching for improved performance with non-ASCII characters. * feat: Enhance LightOnOCR configuration and processing with error handling and optimizations * fix: Improve numerical stability in dtype handling for LightOnOCR models * fix: Improve numerical stability in dtype handling for F16 and BF16
Add unirec-0.1b model (#60) * Add utility functions for VL models and document parsing - Implement error conversion functions for Candle to OCRError. - Add tensor manipulation functions including rotation and concatenation. - Create Markdown conversion functions for layout elements. - Implement OTSL to HTML table conversion with support for various tags. - Add image processing helpers for cropping margins from images. - Introduce functions for calculating bounding box areas and overlap ratios. - Implement detection and filtering of overlapping boxes in layout detection results. - Add tests for utility functions to ensure correctness. * refactor: Update usage documentation and examples for improved clarity and consistency * fix: Remove unnecessary reference for ignore_labels in markdown conversion * refactor: Optimize sinusoidal embedding generation for GPU efficiency
Add PaddleOCR-VL document parsing and image processing modules (#52) * Add PaddleOCR-VL document parsing and image processing modules - Implemented `PaddleOcrVlDocParser` for layout-first document parsing using PP-DocLayoutV2 and PaddleOCR-VL. - Introduced configuration struct `PaddleOcrVlDocParserConfig` to customize parsing behavior. - Added functions for bounding box manipulation, layout element sorting, and order assignment. - Created `PaddleOcrVlImageInputs` and `preprocess_images` function for image preprocessing, including smart resizing and normalization. - Implemented table output post-processing with HTML conversion and token parsing. - Added unit tests for smart resizing and image preprocessing outputs. * bump version to 0.3.2 * Enhance layout detection and processing for PaddleOCR-VL - Update output handling to support reading order for PP-DocLayoutV2. - Introduce `is_reading_order_sorted` flag in layout detection output. - Refactor document parsing to utilize reading order when available. - Add new configuration parameters for image processing. - Improve position embedding interpolation in vision model. * Refactor layout post-processing types and update Clippy linter configuration * Refactor error handling and improve layout post-processing in PaddleOCR-VL * Bump version to 0.4.0 and update HTML entity comment in image processing
Add documentation for pre-trained models and usage guide (#48) * Add documentation for pre-trained models and usage guide - Created models.md to document available pre-trained models for OCR and document understanding, including details on text detection, recognition, character dictionaries, preprocessing, and document structure models. - Added usage.md to provide a comprehensive guide on using OAROCR, covering basic OCR pipeline, batch processing, builder APIs for text recognition and document structure analysis, GPU acceleration, and configuration options. * refactor: Update model file paths in usage documentation for consistency
PreviousNext