Skip to content
View alanwang67's full-sized avatar

Block or report alanwang67

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
18 stars written in Python
Clear filter

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 97,187 26,763 Updated Feb 6, 2026

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 69,600 13,222 Updated Feb 6, 2026

An extremely fast Python type checker and language server, written in Rust.

Python 17,034 208 Updated Feb 5, 2026

Machine Learning Engineering Open Book

Python 16,580 1,034 Updated Jan 23, 2026

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 6,184 569 Updated Aug 22, 2025

Efficient Triton Kernels for LLM Training

Python 6,119 484 Updated Feb 6, 2026

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python 5,051 437 Updated Feb 5, 2026

A PyTorch native platform for training generative AI models

Python 5,039 699 Updated Feb 6, 2026

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 3,322 414 Updated Jan 19, 2026

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 1,902 110 Updated Feb 3, 2026

The Art of Debugging Open Book

Python 1,277 66 Updated Jan 17, 2026

A Quirky Assortment of CuTe Kernels

Python 782 79 Updated Feb 5, 2026
Python 551 47 Updated Feb 5, 2024

Write eBPF programs in Pure Python

Python 212 4 Updated Jan 29, 2026

A storage solution for PyTorch tensors with distributed tensor support.

Python 61 8 Updated Feb 6, 2026

Implementation of the Triangle Multiplicative module, used in Alphafold2 as an efficient way to mix rows or columns of a 2d feature map, as a standalone package for Pytorch

Python 39 2 Updated Aug 3, 2021

My submission for the GPUMODE/AMD fp8 mm challenge

Python 29 Updated Jun 4, 2025
Python 27 5 Updated Sep 22, 2025