Stars
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
[EMNLP 2025] ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents
Embedding model prioritized towards Multimodal RAG, overall + VisDoc double top1 on MMEB benchmark
An Open Phone Agent Model & Framework. Unlocking the AI Phone for Everyone
Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference in pure C/C++
Generate text line images for training deep learning OCR models
利用 onnxruntime 及 PaddleOCR 提供的模型, 对图片中的文字进行检测与识别.
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
Implementation of layer diffuse inference using refiners
Unofficial implementation of Layer Diffuse in diffusers
[WIP] Layer Diffusion for WebUI (via Forge)
Play ChatGPT and other LLM with Xiaomi AI Speaker
PALLAIDIUM — a generative AI movie studio, seamlessly integrated into the Blender Video Editor (VSE), enabling end-to-end production from script to screen and back.
AUTOMATIC1111版web UIをまねた、DiffusersベースのStable Diffusion用GUIです(画像生成のみ)
💯2025年信息系统项目管理师(软考高级)备考资源库。
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
tf-keras code of Face Ear Landmark Detection System (with Multi-Task Learning).
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
A Next-Generation Training Engine Built for Ultra-Large MoE Models
An Open-source Toolkit for LLM Development