Stars
Official implementation of URaG: Unified Retrieval and Generation in Multimodal LLMs for Efficient Long Document Understanding (AAAI 2026 Oral).
[AAAI 2026 Oral] The official GitHub page of "PosterVerse: A Full-Workflow Framework for Commercial-Grade Poster Generation with HTML-Based Scalable Typography"
[arXiv 25] Aesthetics is Cheap, Show me the Text: An Empirical Evaluation of State-of-the-Art Generative Models for OCR
Evaluation of the Optical Character Recognition (OCR) capabilities of GPT-4V(ision)
[IEEE TPAMI 2025] Privacy-Preserving Biometric Verification With Handwritten Random Digit String
[ACL 2025 main] The official GitHub page of "Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restoration"
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
[PR 2025] The official GitHub page of "MegaHan97K: A Large-Scale Dataset for Mega-Category Chinese Character Recognition with over 97K Categories"
[CVPR NTIRE2025 ImageSRx4] BBox Team's Solution
[IEEE TIFS 2024] Online Writer Retrieval with Chinese Handwritten Phrases: A Synergistic Temporal-Frequency Representation Learning Approach
[IJCV 2025] Smaller But Better: Unifying Layout Generation with Smaller Large Language Models