ForAxel

Follow

Blink ForAxel

Follow

8 followers · 14 following

Shanghai, China

Achievements

Achievements

Stars

zhaochenyang20 / Awesome-ML-SYS-Tutorial

My learning notes for ML SYS.

Python 4,795 309 Updated Dec 24, 2025

inclusionAI / asystem-amem

A NCCL extension library, designed to efficiently offload GPU memory allocated by the NCCL communication library.

C++ 71 7 Updated Dec 17, 2025

deepseek-ai / LPLB

An early research stage expert-parallel load balancer for MoE models based on linear programming.

Python 475 27 Updated Nov 19, 2025

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 9,050 889 Updated Dec 24, 2025

GeeeekExplorer / nano-vllm

Nano vLLM

Python 10,109 1,267 Updated Nov 3, 2025

liguodongiot / llm-action

本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

HTML 22,482 2,631 Updated Dec 24, 2025

Cornell-RelaxML / yaqa-quantization

Python 66 2 Updated Jun 20, 2025

flagos-ai / flagtree

FlagTree is a unified compiler supporting multiple AI chip backends for custom Deep Learning operations, which is forked from triton-lang/triton.

C++ 148 31 Updated Dec 25, 2025

labmlai / annotated_deep_learning_paper_implementations

🧑‍🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga…

Python 65,008 6,566 Updated Nov 11, 2025

skyzh / tiny-llm

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

Python 3,579 246 Updated Dec 18, 2025

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…

Python 3,029 587 Updated Dec 22, 2025

Infrasys-AI / AIInfra

AIInfra（AI 基础设施）指AI系统从底层芯片等硬件，到上层软件栈支持AI大模型训练和推理。

Jupyter Notebook 5,517 764 Updated Dec 22, 2025

LMCache / LMCache

Supercharge Your LLM with the Fastest KV Cache Layer

Python 6,436 815 Updated Dec 24, 2025

ruikangliu / FlatQuant

[ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"

Python 202 22 Updated Nov 25, 2025

Tony-Tan / CUDA_Freshman

Cuda 2,643 500 Updated Jan 16, 2024

amirgholami / ai_and_memory_wall

AI and Memory Wall

225 26 Updated Mar 23, 2024

deepseek-ai / open-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,950 288 Updated May 15, 2025

perplexityai / pplx-kernels

Perplexity GPU Kernels

C++ 543 74 Updated Nov 7, 2025

NVIDIA / cuda-python

CUDA Python: Performance meets Productivity

Cython 3,100 234 Updated Dec 24, 2025

tile-ai / tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 4,300 359 Updated Dec 25, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,936 922 Updated Dec 15, 2025

xinychen / tensor-book

张量计算系列教程 (Tensor Computations Tutorials)

134 12 Updated Feb 12, 2024

EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

Python 11,020 2,922 Updated Dec 23, 2025

315386775 / DeepLearing-Interview-Awesome-2024

AIGC-interview/CV-interview/LLMs-interview面试问题与答案集合仓，同时包含工作和科研过程中的新想法、新问题、新资源与新项目

2,716 242 Updated Oct 30, 2025

rapidsai / cuml

cuML - RAPIDS Machine Learning Library

C++ 5,068 612 Updated Dec 23, 2025

Aaronhuang-778 / BiLLM

[ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Python 227 16 Updated Jan 11, 2025

Cornell-RelaxML / qtip

Python 159 17 Updated Jun 22, 2025

microsoft / VPTQ

VPTQ, A Flexible and Extreme low-bit quantization algorithm

Python 670 49 Updated Apr 25, 2025

ZHITENGLI / ARB-LLM

[ICLR'25] ARB-LLM: Alternating Refined Binarizations for Large Language Models

Python 28 2 Updated Aug 5, 2025

mit-han-lab / smoothquant

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Python 1,575 189 Updated Jul 12, 2024