Skip to content
View ForAxel's full-sized avatar

Block or report ForAxel

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

My learning notes for ML SYS.

Python 4,795 309 Updated Dec 24, 2025

A NCCL extension library, designed to efficiently offload GPU memory allocated by the NCCL communication library.

C++ 71 7 Updated Dec 17, 2025

An early research stage expert-parallel load balancer for MoE models based on linear programming.

Python 475 27 Updated Nov 19, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 9,050 889 Updated Dec 24, 2025

Nano vLLM

Python 10,109 1,267 Updated Nov 3, 2025

本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)

HTML 22,482 2,631 Updated Dec 24, 2025

FlagTree is a unified compiler supporting multiple AI chip backends for custom Deep Learning operations, which is forked from triton-lang/triton.

C++ 148 31 Updated Dec 25, 2025

🧑‍🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga…

Python 65,008 6,566 Updated Nov 11, 2025

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

Python 3,579 246 Updated Dec 18, 2025

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…

Python 3,029 587 Updated Dec 22, 2025

AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。

Jupyter Notebook 5,517 764 Updated Dec 22, 2025

Supercharge Your LLM with the Fastest KV Cache Layer

Python 6,436 815 Updated Dec 24, 2025

[ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"

Python 202 22 Updated Nov 25, 2025

AI and Memory Wall

225 26 Updated Mar 23, 2024

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,950 288 Updated May 15, 2025

Perplexity GPU Kernels

C++ 543 74 Updated Nov 7, 2025

CUDA Python: Performance meets Productivity

Cython 3,100 234 Updated Dec 24, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 4,300 359 Updated Dec 25, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,936 922 Updated Dec 15, 2025

张量计算系列教程 (Tensor Computations Tutorials)

134 12 Updated Feb 12, 2024

A framework for few-shot evaluation of language models.

Python 11,020 2,922 Updated Dec 23, 2025

AIGC-interview/CV-interview/LLMs-interview面试问题与答案集合仓,同时包含工作和科研过程中的新想法、新问题、新资源与新项目

2,716 242 Updated Oct 30, 2025

cuML - RAPIDS Machine Learning Library

C++ 5,068 612 Updated Dec 23, 2025

[ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Python 227 16 Updated Jan 11, 2025
Python 159 17 Updated Jun 22, 2025

VPTQ, A Flexible and Extreme low-bit quantization algorithm

Python 670 49 Updated Apr 25, 2025

[ICLR'25] ARB-LLM: Alternating Refined Binarizations for Large Language Models

Python 28 2 Updated Aug 5, 2025

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Python 1,575 189 Updated Jul 12, 2024
Next