Stars
PyTorch template for Deep Learning projects with support for scalable multi-GPU and multi-node training.
A high-throughput and memory-efficient inference and serving engine for LLMs
Optimized primitives for collective multi-GPU communication
DeepEP: an efficient expert-parallel communication library
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)
FlashInfer: Kernel Library for LLM Serving
A framework for GPU-friendly implementations of tree algorithms -- such as kNN and FoF -- in jax (with a CUDA backend)
Open-source simulator for autonomous driving research.
CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning
A rclcpp-compatible true zero-copy IPC middleware that supports all ROS message types, including message structs already generated by rosidl.
SGLang is a high-performance serving framework for large language models and multimodal models.
Official Kubernetes Operator for ClickHouse®
A profiling tool for agentic tools like Claude Code and Codex
Домашние задания для курса "Вычисления на видеокартах" в CS Space
The auto solution checking system for YSDA; server, storing grades and managing deadlines
AntonGorokhov / nanochat
Forked from karpathy/nanochatThe best ChatGPT that $100 can buy.
AntonGorokhov / minGPT
Forked from karpathy/minGPTA minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
DuckDB extension for reading data stored in the Apache GraphAr format.
A Go implementation of EIP-4361 Sign In With Ethereum verification
Error-bounded Lossy Data Compressor (for floating-point/integer datasets)