Lists (17)
Sort Name ascending (A-Z)
🔥 Agent & MCP
✊ AI & LLMOps
about all ai and llm projects.🚀 CloudNatvie
Kubernetes,Istio,Mircoservice,AnyOps🧑🏻💻 Develop
🧑🏻💻 Develop.rust
📖 easy-website
about docs🧑🏻💻 golang pkg
🌟 Help This Grow
🎱 n8n.workflow
Networks
🍚 New for Golang
🔥 Python
🤓 Rust
SD Infra & Tools
Share AI apss
show all coding with ai🎉 Sharing Awsome
About funny project with my life🌟 Star-dao follow ME
Open source projects contributed by daocloud- All languages
- ActionScript
- Astro
- Batchfile
- C
- C#
- C++
- CMake
- CSS
- CoffeeScript
- Cuda
- Dart
- Dockerfile
- Elixir
- Go
- Groovy
- HTML
- Java
- JavaScript
- Jinja
- Jupyter Notebook
- Kotlin
- Lua
- MATLAB
- MDX
- MLIR
- Makefile
- Markdown
- Mojo
- Mustache
- Nu
- Nushell
- Objective-C
- PHP
- Python
- QMake
- Roff
- Ruby
- Rust
- SCSS
- Sass
- Scala
- Scheme
- Shell
- Smarty
- Svelte
- Swift
- TeX
- TypeScript
- Vim Script
- Vue
- reStructuredText
Starred repositories
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Claude Code superpowers: core skills library
A curated list of awesome Claude Skills, resources, and tools for customizing Claude AI workflows
A high-performance and light-weight router for vLLM large scale deployment
free codex and claude code if you have github copilot !
Swagger Online is a lightweight React tool that aggregates multiple Swagger/OpenAPI specs into one unified, searchable, and comparable interface.
Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search
Containerd snapshots quota NRI plugin, user can set every container ephemeral storage, but in ephemeral storage use full pod will not restart.
A framework for efficient model inference with omni-modality models
Discover ingress-nginx usage and auto-generate Gateway API migration plans before ingress-nginx reaches end-of-life (March 2026).
Persist and reuse KV Cache to speedup your LLM.
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
An early research stage expert-parallel load balancer for MoE models based on linear programming.
Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk
Fastest LLM gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.
A fast multi-producer, multi-consumer lock-free concurrent queue for C++11
A high-performance inference system for large language models, designed for production environments.
KV cache store for distributed LLM inference
FlashInfer: Kernel Library for LLM Serving
High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.
Repository for out-of-tree scheduler plugins based on scheduler framework.
Achieve state of the art inference performance with modern accelerators on Kubernetes
The Intelligent Inference Scheduler for Large-scale Inference Services.
⚒️ AlphaTrion is an open-source framework to help build GenAI applications, including experiment tracking, adaptive model routing, prompt optimization and performance evaluation.
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
RouterArena: An open framework for evaluating LLM routers with standardized datasets, metrics, an automated framework, and a live leaderboard.