awgu

😴

Andrew Gu awgu

😴

182 followers · 17 following

New York, NY

Achievements

x3 x3

Achievements

x3 x3

Stars

tile-ai / tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python 5,434 491 Updated Mar 27, 2026

thinking-machines-lab / manifolds

Supporting code for the blog post on modular manifolds.

Python 121 13 Updated Sep 26, 2025

thinking-machines-lab / batch_invariant_ops

Python 982 73 Updated Nov 4, 2025

Dao-AILab / quack

A Quirky Assortment of CuTe Kernels

Python 865 100 Updated Mar 27, 2026

fla-org / flash-linear-attention

🚀 Efficient implementations of state-of-the-art linear attention models

Python 4,735 466 Updated Mar 27, 2026

pytorch / FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

C++ 1,548 730 Updated Mar 28, 2026

ByteDance-Seed / Triton-distributed

Distributed Compiler based on Triton for Parallel Systems

Python 1,398 135 Updated Mar 11, 2026

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 23,017 2,559 Updated Mar 28, 2026

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

Python 15,827 3,764 Updated Mar 28, 2026

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…

Python 3,246 676 Updated Mar 25, 2026

sgl-project / sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 25,021 5,035 Updated Mar 28, 2026

deepseek-ai / open-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,972 288 Updated May 15, 2025

deepseek-ai / profile-data

Analyze computation-communication overlap in V3/R1.

1,150 145 Updated Mar 21, 2025

deepseek-ai / DualPipe

A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.

Python 2,936 318 Updated Jan 14, 2026

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 6,289 842 Updated Mar 22, 2026

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 9,074 1,130 Updated Feb 9, 2026

yifuwang / symm-mem-recipes

Python 164 16 Updated Dec 27, 2024

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 3,279 267 Updated Mar 28, 2026

higham / what-is

Important concepts in numerical linear algebra and related areas

812 68 Updated Jan 13, 2024

NVIDIA / cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,499 1,754 Updated Mar 24, 2026

pytorch / torchtitan

A PyTorch native platform for training generative AI models

Python 5,190 763 Updated Mar 28, 2026

python / cpython

The Python programming language

Python 72,099 34,318 Updated Mar 28, 2026

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 18,780 2,708 Updated Mar 28, 2026

pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 98,603 27,330 Updated Mar 28, 2026

jax-ml / jax

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Python 35,237 3,492 Updated Mar 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Andrew Gu awgu

Achievements

Achievements

Block or report awgu

Stars

tile-ai / tilelang

thinking-machines-lab / manifolds

thinking-machines-lab / batch_invariant_ops

Dao-AILab / quack

fla-org / flash-linear-attention

pytorch / FBGEMM

ByteDance-Seed / Triton-distributed

Dao-AILab / flash-attention

NVIDIA / Megatron-LM

NVIDIA / TransformerEngine

sgl-project / sglang

deepseek-ai / open-infra-index

deepseek-ai / profile-data

deepseek-ai / DualPipe

deepseek-ai / DeepGEMM

deepseek-ai / DeepEP

yifuwang / symm-mem-recipes

HazyResearch / ThunderKittens

higham / what-is

NVIDIA / cutlass

pytorch / torchtitan

python / cpython

triton-lang / triton

pytorch / pytorch

jax-ml / jax