Skip to content
View tqchen's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Organizations

@apache @dmlc @uwsampl @octoml

Block or report tqchen

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

modern gpu programming

HTML 15 1 Updated Jun 23, 2026

ML kernels and benchmarking infrastructure written in TIRx

Python 28 2 Updated Jun 19, 2026

Fast and memory-efficient classical machine learning operators

Python 537 42 Updated Jun 20, 2026

Compact and Agent-Native MoE Training System

Python 202 17 Updated Jun 23, 2026

A kernel library written in tilelang

Python 1,597 142 Updated Apr 23, 2026

The open-source agent-serving project

Python 477 32 Updated Jun 8, 2026
Rust 7 Updated Mar 9, 2026

Our first fully AI generated deep learning system

Python 630 48 Updated Feb 2, 2026

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 4,443 708 Updated May 17, 2026

Fast and memory-efficient exact attention

Python 24,216 2,851 Updated Jun 22, 2026

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 2,083 140 Updated Jun 17, 2026

Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.

Python 1,604 273 Updated Jun 23, 2026

Perplexity open source garden for inference technology

Rust 585 56 Updated May 27, 2026

Building the Virtuous Cycle for AI-driven LLM Systems

Python 250 41 Updated May 1, 2026

JAX support for tvm-ffi abi

Python 26 5 Updated May 14, 2026

Open ABI and FFI for Machine Learning Systems

C++ 418 80 Updated Jun 21, 2026

Ship correct and fast LLM kernels to PyTorch

Python 151 17 Updated Jan 14, 2026

A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.

Python 890 155 Updated Jun 23, 2026

a size profiler for cuda binary

Python 69 Updated Jan 15, 2026

An extremely fast Python package and project manager, written in Rust.

Rust 86,671 3,232 Updated Jun 23, 2026

VS Code extension for syntax highlighting C++/CUDA/HIP code in PyTorch load_inline() strings

Python 9 Updated Jul 25, 2025

RFC document, tooling and other content related to the array API standard

Python 268 54 Updated Apr 23, 2026

AGENTS.md — a simple, open format for guiding coding agents

TypeScript 22,425 1,652 Updated Mar 12, 2026

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 13,940 2,489 Updated Jun 23, 2026

🎡 Build Python wheels for all the platforms with minimal configuration.

Python 2,243 320 Updated Jun 22, 2026

A next generation Python CMake adaptor and Python API for plugins

Python 476 89 Updated Jun 22, 2026

Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.

Python 488 26 Updated Jun 11, 2026

Minimum example for deploying Apache TVM's Relax IR using C++ API

C++ 6 1 Updated Nov 29, 2025
Python 119 10 Updated Sep 13, 2025

JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM training

Python 78 2 Updated Jun 18, 2026
Next