- All languages
- Assembly
- Batchfile
- C
- C#
- C++
- CMake
- CSS
- Clojure
- Cuda
- Cython
- D2
- Dart
- Dockerfile
- Elixir
- Go
- Go Template
- HCL
- HTML
- Java
- JavaScript
- Jupyter Notebook
- LLVM
- Lua
- MATLAB
- MDX
- Makefile
- Markdown
- Mustache
- NSIS
- Objective-C
- OpenQASM
- PHP
- PLpgSQL
- Perl
- Python
- R
- Roff
- Ruby
- Rust
- Scala
- Shell
- Smarty
- Swift
- TypeScript
- Vim Script
- Vue
- YAML
- Yacc
Starred repositories
A kernel library written in tilelang
A theoretical reconstruction of the Claude Mythos architecture, built from first principles using the available research literature.
Analyze computation-communication overlap in V3/R1.
sphish / perftest
Forked from linux-rdma/perftestInfiniband Verbs Performance Tests
tukuaiai / vibe-coding-cn
Forked from EnzeD/vibe-codingVibe Coding 指南 - 涵盖 Prompt 提示词、Skill 技能库、Workflow 工作流的 AI 编程工作站
CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
A developer tool for disassembling, analyzing, debugging, and visualizing BPF object files.
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)
Supercharge Your LLM with the Fastest KV Cache Layer
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discr…
TurboDiffusion: 100–200× Acceleration for Video Diffusion Models
Perplexity open source garden for inference technology
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
Tile primitives for speedy kernels
Official inference repo for FLUX.1 models
An early research stage expert-parallel load balancer for MoE models based on linear programming.
A guidance language for controlling large language models.
Disaggregated serving system for Large Language Models (LLMs).
High performance Transformer implementation in C++.