Highlights
- Pro
Stars
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
A compiler, optimizer and executor for financial expressions and factors
Graphs that teach > graphs that impress. Turn any code into an interactive knowledge graph you can explore, search, and ask questions about. Works with Claude Code, Codex, Cursor, Copilot, Gemini C…
Simple samples for TensorRT programming
TokenSpeed is a speed-of-light LLM inference engine.
A project to improve skills of large language models
high-performance linear attention kernel library built on TileLang
A unified library of SOTA model optimization techniques like quantization, distillation, pruning, neural architecture search, speculative decoding, etc. It compresses deep learning models for downs…
CUDA kernels for linear attention variants, written in CuTe DSL and CUTLASS C++.
An agent-managed museum exhibit, built in Rust with Gajae-Code / LazyCodex — developed and maintained with no human intervention.
An agentic skills framework & software development methodology that works.
Bash is all you need - A nano claude code–like 「agent harness」, built from 0 to 1
A plug-and-play compiler that delivers free-lunch optimizations for both inference and training.
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
An LLM-free Multi-dimensional Benchmark for Multi-modal Hallucination Evaluation
AI agents running research on single-GPU nanochat training automatically
Automated High-Performance GPU Kernel Generation
FlashMLA: Efficient Multi-head Latent Attention Kernels
0 - 1 learn OpenClaw: sections to build an claw-AI agent from scratch
💖🧸 Self hosted, you-owned Grok Companion, a container of souls of waifu, cyber livings to bring them into our worlds, wishing to achieve Neuro-sama's altitude. Capable of realtime voice chat, Minec…