Stars
TurboQuant: Near-optimal KV cache quantization for LLM inference (3-bit keys, 2-bit values) with Triton kernels + vLLM integration
A from-scratch Prefill/Decode disaggregation inference engine for LLMs
🚀🚀 「大模型」2小时完全从0训练64M的小参数GPT!🌏 Train a 64M-parameter GPT from scratch in just 2h!
Bash is all you need - A nano claude code–like 「agent harness」, built from 0 to 1
[NeurIPS2025] "AI-Researcher: Autonomous Scientific Innovation" -- A production-ready version: https://novix.science/chat
AI agents running research on single-GPU nanochat training automatically
Code repo for efficient quantized MoE inference with mixture of low-rank compensators
Official implementation of Half-Quadratic Quantization (HQQ)
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
A machine model for line-rate programmable switches
This repository contains the source code for P4TG, a 1 Tb/s traffic generator for Ethernet/IP networks
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
"Your Fully-Automated Personal AI Assistant"
Your personal AI trading assistant. Any market. Any model. Pay with USDC, not API keys.
A opensource AI trading platform in real market,
Hands-on tutorial to learn the building blocks of the Next-Gen SDN architecture
CRS-自建Claude Code镜像,一站式开源中转服务,让 Claude、OpenAI、Gemini、Droid 订阅统一接入,支持拼车共享,更高效分摊成本,原生工具无缝使用。
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)
Stratum is an open source silicon-independent switch operating system for software defined networks.