xxyux

Xiangrui Yu xxyux

RD in Training Infra, Paddle. I graduated from HKUST(GZ) with Mphil Degree. My interests based on AI Infra System. Before that, I graduated from CUP.

23 followers · 34 following

PaddlePaddle, Baidu
Beijing
04:47 (UTC +08:00)

Achievements

Lists (1)

Sort

CUDA-Sample

3 repositories

Stars

deepseek-ai / Engram

Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models

Python 4,452 340 Updated Jan 14, 2026

FMInference / DejaVu

Python 359 45 Updated Apr 2, 2024

ChenMnZ / PrefixQuant

An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantization

Python 175 17 Updated Nov 26, 2025

LeanModels / DFloat11

DFloat11 [NeurIPS '25]: Lossless Compression of LLMs and DiTs for Efficient GPU Inference

Python 638 37 Updated Nov 24, 2025

HPMLL / NVIDIA-Hopper-Benchmark

C++ 108 19 Updated May 31, 2025

pytorch / ao

PyTorch native quantization and sparsity for training and inference

Python 2,856 527 Updated Jun 12, 2026

DensoITLab / sas_

Python 3 Updated Jun 12, 2025

open-compass / opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Python 7,082 788 Updated Jun 12, 2026

LLMServe / DistServe

Disaggregated serving system for Large Language Models (LLMs).

Jupyter Notebook 820 94 Updated Apr 6, 2025

LLMServe / SwiftTransformer

High performance Transformer implementation in C++.

C++ 152 18 Updated Jan 18, 2025

HPMLL / SpInfer_EuroSys25

Cuda 34 1 Updated Apr 2, 2025

OpenBitSys / BitDecoding

[HPCA 2026] A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.

C++ 93 11 Updated May 14, 2026

FasterDecoding / TEAL

Python 166 16 Updated Feb 15, 2025

IST-DASLab / gptq

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Python 2,320 201 Updated Mar 27, 2024

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations

Python 17,274 1,313 Updated Jun 7, 2026

deepseek-ai / open-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

8,002 287 Updated May 15, 2025

IST-DASLab / marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 1,085 88 Updated Sep 4, 2024

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 19,434 2,937 Updated Jun 13, 2026

microsoft / MInference

[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filli…

Python 1,221 78 Updated Apr 8, 2026

ruikangliu / IntactKV

[ACL 2024] Official PyTorch implementation of "IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact"

Python 45 1 Updated May 24, 2024

Hsu1023 / DuQuant

[NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.

Python 182 18 Updated Apr 24, 2026

spcl / QuaRot

Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.

Python 514 73 Updated Nov 26, 2024

galeselee / 6000D-Project

This is the repo for 6000D(Graph Processing and Analytics) final proj of HKUST-GZ

Cuda 3 Updated Dec 14, 2023

Zefan-Cai / KVCache-Factory

Unified KV Cache Compression Methods for Auto-Regressive Models

Python 1,341 172 Updated Jan 4, 2025

Cornell-RelaxML / quip-sharp

Python 595 51 Updated Oct 29, 2024

Dao-AILab / fast-hadamard-transform

Fast Hadamard transform in CUDA, with a PyTorch interface

C 327 63 Updated Mar 10, 2026

DD-DuDa / Cute-Learning

Examples of CUDA implementations by Cutlass CuTe

Makefile 278 34 Updated Jul 1, 2025

NVIDIA-NeMo / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 17,369 3,434 Updated Jun 13, 2026

HeyDavid633 / CCF-THPC-MP

The source code and script of CCF-THPC-

Python 2 Updated Feb 23, 2026

KnowingNothing / MatmulTutorial

A Easy-to-understand TensorOp Matmul Tutorial

C++ 441 55 Updated Mar 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Xiangrui Yu xxyux

Achievements

Achievements

Block or report xxyux

Lists (1)

CUDA-Sample

Stars

deepseek-ai / Engram

FMInference / DejaVu

ChenMnZ / PrefixQuant

LeanModels / DFloat11

HPMLL / NVIDIA-Hopper-Benchmark

pytorch / ao

DensoITLab / sas_

open-compass / opencompass

LLMServe / DistServe

LLMServe / SwiftTransformer

HPMLL / SpInfer_EuroSys25

OpenBitSys / BitDecoding

FasterDecoding / TEAL

IST-DASLab / gptq

kvcache-ai / ktransformers

deepseek-ai / open-infra-index

IST-DASLab / marlin

triton-lang / triton

microsoft / MInference

ruikangliu / IntactKV

Hsu1023 / DuQuant

spcl / QuaRot

galeselee / 6000D-Project

Zefan-Cai / KVCache-Factory

Cornell-RelaxML / quip-sharp

Dao-AILab / fast-hadamard-transform

DD-DuDa / Cute-Learning

NVIDIA-NeMo / NeMo

HeyDavid633 / CCF-THPC-MP

KnowingNothing / MatmulTutorial