Skip to content
View tqchen's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Organizations

@apache @dmlc @uwsampl @octoml

Block or report tqchen

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Our first fully AI generated deep learning system

Python 531 37 Updated Feb 2, 2026

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 3,491 435 Updated Feb 11, 2026

Fast and memory-efficient exact attention

Python 22,261 2,383 Updated Feb 16, 2026

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 1,925 116 Updated Feb 13, 2026

Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.

Python 880 109 Updated Feb 15, 2026

Perplexity open source garden for inference technology

Rust 364 28 Updated Dec 25, 2025

Building the Virtuous Cycle for AI-driven LLM Systems

Python 176 26 Updated Feb 13, 2026

JAX support for tvm-ffi abi

C++ 23 3 Updated Dec 10, 2025

Open ABI and FFI for Machine Learning Systems

C++ 346 60 Updated Feb 16, 2026

Ship correct and fast LLM kernels to PyTorch

Python 142 17 Updated Jan 14, 2026

A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.

Python 749 103 Updated Feb 16, 2026

a size profiler for cuda binary

Python 72 Updated Jan 15, 2026

An extremely fast Python package and project manager, written in Rust.

Rust 79,286 2,566 Updated Feb 16, 2026

VS Code extension for syntax highlighting C++/CUDA/HIP code in PyTorch load_inline() strings

Python 9 Updated Jul 25, 2025

RFC document, tooling and other content related to the array API standard

Python 264 54 Updated Feb 5, 2026

AGENTS.md — a simple, open format for guiding coding agents

TypeScript 17,449 1,236 Updated Dec 19, 2025

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 12,886 2,101 Updated Feb 16, 2026

🎡 Build Python wheels for all the platforms with minimal configuration.

Python 2,186 299 Updated Feb 14, 2026

A next generation Python CMake adaptor and Python API for plugins

Python 442 81 Updated Feb 16, 2026

Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.

Python 444 15 Updated Feb 4, 2026

Minimum example for deploying Apache TVM's Relax IR using C++ API

C++ 5 Updated Nov 29, 2025
Python 113 10 Updated Sep 13, 2025

JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM training

Python 64 1 Updated Feb 13, 2026

Distributed Compiler based on Triton for Parallel Systems

Python 1,358 127 Updated Feb 13, 2026

A Datacenter Scale Distributed Inference Serving Framework

Rust 6,099 856 Updated Feb 16, 2026

A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.

Python 2,919 312 Updated Jan 14, 2026

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 6,178 818 Updated Feb 3, 2026

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 12,489 985 Updated Feb 6, 2026

verl: Volcano Engine Reinforcement Learning for LLMs

Python 19,238 3,246 Updated Feb 16, 2026

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,965 288 Updated May 15, 2025
Next