-
Carnegie Mellon University
- Pittsburgh PA
- https://xuezhemax.github.io/
Highlights
- Pro
Stars
torchcomms: a modern PyTorch communications API
Simple & Scalable Pretraining for Neural Architecture Research
fast trainer for educational purposes
Efficient Triton Kernels for LLM Training
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)
Reverse Engineering Gemma 3n: Google's New Edge-Optimized Language Model
DeepEP: an efficient expert-parallel communication library
MoBA: Mixture of Block Attention for Long-Context LLMs
🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"
Long context evaluation for large language models
Fast and memory-efficient exact attention
Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTORCH
Unofficial PyTorch/🤗Transformers(Gemma/Llama3) implementation of Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
🚀 Efficient implementations of state-of-the-art linear attention models
Building blocks for foundation models.
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥
This project extends the idea of the innovative architecture of Kolmogorov-Arnold Networks (KAN) to the Convolutional Layers, changing the classic linear transformation of the convolution to learna…
An efficient pure-PyTorch implementation of Kolmogorov-Arnold Network (KAN).
Kolmogorov-Arnold Networks (KAN) using Chebyshev polynomials instead of B-splines.
Reference implementation of Megalodon 7B model