Lists (7)
Sort Name ascending (A-Z)
Stars
Stable Diffusion web UI
[TIM 2025] Towards Accurate Readings of Water Meters by Eliminating Transition Error: New Dataset and Effective Solution
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)
Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models (CVPR 2024 Highlight)
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Update the latest text-related papers from top conferences
A general list of resources to image text localization and recognition 场景文本位置感知与识别的论文资源与实现合集 シーンテキストの位置認識と識別のための論文リソースの要約
Text recognition (optical character recognition) with deep learning methods, ICCV 2019
Java 学习&面试指南(Go、Python 后端面试通用,计算机基础面试总结)。准备后端技术面试,首选 JavaGuide!
🍰 Desktop utility to download images/videos/music/text from various websites, and more.
The official repo for [CVPR'23] "DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting" & [ArXiv'23] "DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multi…
Scene Text Recognition with Permuted Autoregressive Sequence Models (ECCV 2022)
Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition
Collect some papers about transformer with vision. Awesome Transformer with Computer Vision (CV)
An implementation of "CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model".
The official code for the CVPR 2024 paper: Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer
Training and using a neural network to read out the value of an analog display - example including small node server for demonstration
Easy to use device for connecting "old" measuring units (water, power, gas, ...) to the digital world
A Clash GUI based on tauri. Supports Windows, macOS and Linux.
Netty project - an event-driven asynchronous network application framework
《代码随想录》LeetCode 刷题攻略:200道经典题目刷题顺序,共60w字的详细图解,视频难点剖析,50余张思维导图,支持C++,Java,Python,Go,JavaScript等多语言版本,从此算法学习不再迷茫!🔥🔥 来看看,你会发现相见恨晚!🚀