Stars
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming
[EMNLP2025] "LightRAG: Simple and Fast Retrieval-Augmented Generation"
MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone
ChatGLM3 series: Open Bilingual Chat LLMs | 开源双语对话语言模型
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
Klavis AI (YC X25): MCP integration platforms that let AI agents use tools reliably at any scale
The open source platform for AI-native application development.
Nexent is a zero-code platform for auto-generating agents — no orchestration, no complex drag-and-drop required. Nexent also offers powerful capabilities for agent running control, data processing …
[ICLR 2025] Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation
The next generation deep reinforcement learning tookit
Easiest and laziest way for building multi-agent LLMs applications.
This project is the official implementation of 'DreamOmni2: Multimodal Instruction-based Editing and Generation''
Applications self-hosting and DevOps platform for running open source, web-based linux Panel of lite PaaS
Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
【ICML 2025 Spotlight】 Official Repo for Paper ‘’HealthGPT : A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation‘’
Distributed GPU-Accelerated Framework for Evolutionary Computation. Comprehensive Library of Evolutionary Algorithms & Benchmark Problems.
[NeurIPS 2024] DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation
Res-SAM Framework for GPR Underground Hazard Detection
Video generation from text&image, 1st-gen
Lumina-DiMOO - An Open-Sourced Multi-Modal Large Diffusion Language Model
Unified Multimodal Model for image generation/editing/understanding
Official implementation of OpenWBT.
[ICCV 2025] Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement 🔥
A powerful baseline for image classification, face recognition and image retrieval with Pytorch
Inspiring the Next Generation of Segment Anything Models: Comprehensively Evaluate SAM and SAM 2 with Diverse Prompts Towards Context-Dependent Concepts under Different Scenes