XuezheMax

💭

I may be slow to respond.

Max Ma XuezheMax

💭

I may be slow to respond.

323 followers · 9 following

Carnegie Mellon University
Pittsburgh PA
https://xuezhemax.github.io/

Achievements

x3 x2

Achievements

x3 x2

Highlights

Organizations

Stars

meta-pytorch / torchcomms

torchcomms: a modern PyTorch communications API

C++ 310 47 Updated Dec 21, 2025

tilde-research / MoMoE-impl

Memory optimized Mixture of Experts

Python 72 6 Updated Jul 25, 2025

tgale96 / grouped_gemm

PyTorch bindings for CUTLASS grouped GEMM.

Cuda 135 79 Updated May 29, 2025

microsoft / ArchScale

Simple & Scalable Pretraining for Neural Architecture Research

Python 305 32 Updated Dec 6, 2025

meta-llama / llama3

The official Meta Llama 3 GitHub site

Python 29,144 3,501 Updated Jan 26, 2025

apple / ml-cross-entropy

Python 565 56 Updated Sep 23, 2025

Red-Hat-AI-Innovation-Team / mini_trainer

fast trainer for educational purposes

Python 22 12 Updated Nov 26, 2025

linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training

Python 5,962 452 Updated Dec 20, 2025

uccl-project / uccl

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 1,132 105 Updated Dec 21, 2025

antimatter15 / reverse-engineering-gemma-3n

Reverse Engineering Gemma 3n: Google's New Edge-Optimized Language Model

Python 254 17 Updated May 27, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,820 1,035 Updated Dec 5, 2025

MoonshotAI / MoBA

MoBA: Mixture of Block Attention for Long-Context LLMs

Python 2,017 128 Updated Apr 3, 2025

fla-org / native-sparse-attention

🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"

Python 943 48 Updated Mar 19, 2025

ROCm / HIPIFY

HIPIFY: Convert CUDA to Portable C++ Code

C++ 637 101 Updated Dec 21, 2025

magicproduct / hash-hop

Long context evaluation for large language models

Python 224 23 Updated Mar 3, 2025

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 21,207 2,235 Updated Dec 20, 2025

test-time-training / ttt-lm-pytorch

Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Python 1,293 85 Updated Jul 14, 2024

kyegomez / Infini-attention

Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTORCH

Python 58 5 Updated Dec 21, 2025

Beomi / InfiniTransformer

Unofficial PyTorch/🤗Transformers(Gemma/Llama3) implementation of Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Python 375 34 Updated Apr 23, 2024

cambrian-mllm / cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,975 134 Updated Nov 7, 2025

fla-org / flash-linear-attention

🚀 Efficient implementations of state-of-the-art linear attention models

Python 4,094 334 Updated Dec 20, 2025

HazyResearch / aisys-building-blocks

Building blocks for foundation models.

584 28 Updated Jan 3, 2024

Xnhyacinth / Awesome-LLM-Long-Context-Modeling

📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥

1,850 78 Updated Dec 6, 2025

AntonioTepsich / Convolutional-KANs

This project extends the idea of the innovative architecture of Kolmogorov-Arnold Networks (KAN) to the Convolutional Layers, changing the classic linear transformation of the convolution to learna…

Jupyter Notebook 910 97 Updated Apr 8, 2025

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 3,008 217 Updated Dec 9, 2025

Blealtan / efficient-kan

An efficient pure-PyTorch implementation of Kolmogorov-Arnold Network (KAN).

Python 4,549 408 Updated Aug 1, 2024

SynodicMonth / ChebyKAN

Kolmogorov-Arnold Networks (KAN) using Chebyshev polynomials instead of B-splines.

Jupyter Notebook 399 42 Updated May 13, 2024

KindXiaoming / pykan

Kolmogorov Arnold Networks

Jupyter Notebook 16,056 1,539 Updated Jan 19, 2025

GistNoesis / FourierKAN

Python 749 62 Updated May 24, 2024

XuezheMax / megalodon

Reference implementation of Megalodon 7B model

Cuda 527 54 Updated May 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Max Ma XuezheMax

Achievements

Achievements

Highlights

Organizations

Block or report XuezheMax

Stars

meta-pytorch / torchcomms

tilde-research / MoMoE-impl

tgale96 / grouped_gemm

microsoft / ArchScale

meta-llama / llama3

apple / ml-cross-entropy

Red-Hat-AI-Innovation-Team / mini_trainer

linkedin / Liger-Kernel

uccl-project / uccl

antimatter15 / reverse-engineering-gemma-3n

deepseek-ai / DeepEP

MoonshotAI / MoBA

fla-org / native-sparse-attention

ROCm / HIPIFY

magicproduct / hash-hop

Dao-AILab / flash-attention

test-time-training / ttt-lm-pytorch

kyegomez / Infini-attention

Beomi / InfiniTransformer

cambrian-mllm / cambrian

fla-org / flash-linear-attention

HazyResearch / aisys-building-blocks

Xnhyacinth / Awesome-LLM-Long-Context-Modeling

AntonioTepsich / Convolutional-KANs

HazyResearch / ThunderKittens

Blealtan / efficient-kan

SynodicMonth / ChebyKAN

KindXiaoming / pykan

GistNoesis / FourierKAN

XuezheMax / megalodon