Stars
Embedding model prioritized towards Multimodal RAG, overall + VisDoc double top1 on MMEB benchmark
An Open Phone Agent Model & Framework. Unlocking the AI Phone for Everyone
Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference in pure C/C++
Generate text line images for training deep learning OCR models
利用 onnxruntime 及 PaddleOCR 提供的模型, 对图片中的文字进行检测与识别.
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
Implementation of layer diffuse inference using refiners
Unofficial implementation of Layer Diffuse in diffusers
[WIP] Layer Diffusion for WebUI (via Forge)
An OpenAI API compatible API for chat with image input and questions about the images. aka Multimodal.
Play ChatGPT and other LLM with Xiaomi AI Speaker
PALLAIDIUM — a generative AI movie studio, seamlessly integrated into the Blender Video Editor (VSE), enabling end-to-end production from script to screen and back.
AUTOMATIC1111版web UIをまねた、DiffusersベースのStable Diffusion用GUIです(画像生成のみ)
💯2025年信息系统项目管理师(软考高级)备考资源库。
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
tf-keras code of Face Ear Landmark Detection System (with Multi-Task Learning).
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
A Next-Generation Training Engine Built for Ultra-Large MoE Models
An Open-source Toolkit for LLM Development
FaceChain is a deep-learning toolchain for generating your Digital-Twin.