Stars
A Unified Cache Acceleration Framework for 🤗 Diffusers: Qwen-Image-Lightning, Qwen-Image, HunyuanImage, FLUX, Wan, etc.
SGLang is a fast serving framework for large language models and vision language models.
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
Implementation of "YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception".
🚀 Easier & Faster YOLO Deployment Toolkit for NVIDIA 🛠️
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…
Trae Agent is an LLM-based agent for general purpose software engineering tasks.
open-source coding LLM for software engineering tasks
The power of Claude Code / GeminiCLI / CodexCLI + [Gemini / OpenAI / OpenRouter / Azure / Grok / Ollama / Custom Model / All Of The Above] working as one.
Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.
🤖 The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transf…
Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.
Production-ready platform for agentic workflow development.
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
Langflow is a powerful tool for building and deploying AI-powered agents and workflows.
Eliminate all the tedious hassle when making state-of-the-art C++ 14 - 23 libraries!
Fully Local Manus AI. No APIs, No $200 monthly bills. Enjoy an autonomous agent that thinks, browses the web, and code for the sole cost of electricity. 🔔 Official updates only via twitter @Martin9…
Open Source Deep Research Alternative to Reason and Search on Private Data. Written in Python.
A tool for bandwidth measurements on NVIDIA GPUs.
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
PaddleOCR inference in PyTorch. Converted from [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)