Highlights
- Pro
Stars
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
Robust Speech Recognition via Large-Scale Weak Supervision
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Unified web UI for training and running open models like Qwen, DeepSeek, gpt-oss and Gemma locally.
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
The world's simplest facial recognition api for Python and the command line
LlamaIndex is the leading document agent and OCR platform
Making large AI models cheaper, faster and more accessible
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
A generative speech model for daily dialogue.
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
Official Code for DragGAN (SIGGRAPH 2023)
"🐈 nanobot: The Ultra-Lightweight OpenClaw"
OpenMMLab Detection Toolbox and Benchmark
A modular graph-based Retrieval-Augmented Generation (RAG) system
Code and documentation to train Stanford's Alpaca models, and generate the data.
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
SGLang is a high-performance serving framework for large language models and multimodal models.
A Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Multimodal Live Streaming on Your Phone
Universal LLM Deployment Engine with ML Compilation
Best Practices on Recommendation Systems
verl: Volcano Engine Reinforcement Learning for LLMs