xxyux

Xiangrui Yu xxyux

RD in Training Infra, Paddle. I graduated from HKUST(GZ) with Mphil Degree. My interests based on AI Infra System. Before that, I graduated from CUP.

23 followers · 34 following

PaddlePaddle, Baidu
Beijing
04:56 (UTC +08:00)

Achievements

PaddleFleet Public
Forked from PaddlePaddle/PaddleFleet

Core Functional Library for Distributed Training

Python Apache License 2.0 Updated Jun 4, 2026
PaddleFormers Public
Forked from PaddlePaddle/PaddleFormers

PaddleFormers is an easy-to-use library of pre-trained large language model zoo based on PaddlePaddle.

Python Apache License 2.0 Updated Jun 2, 2026
Paddle Public
Forked from PaddlePaddle/Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）

C++ Apache License 2.0 Updated May 19, 2026
flash-attention Public
Forked from PaddlePaddle/flash-attention

Fast and memory-efficient exact attention

C++ BSD 3-Clause "New" or "Revised" License Updated May 9, 2026
ZipServ Public

Cuda 6 1 Apache License 2.0 Updated Mar 9, 2026
test_flashmask Public
Forked from umiswing/test_flashmask

Python Updated Dec 15, 2025
SpInfer Public

SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs

Cuda 65 16 Apache License 2.0 Updated Mar 25, 2025
Fine-tuning-LLM-with-2-4-sparse Public

Fine-tuning Llama-2-7B for Text classification. Datasets: imdb , framework: deepspeed.

Python 2 Updated Mar 4, 2024
Distributed-SpMV Public

Distributed-SpMV, c/mpi/openmp, this work was accepted by IEEE/ACM CCGrid'23.

mpi distributed-computing spmv openmpi-cpu-clusters

C 3 1 Updated Mar 4, 2024
cuAlias Public

Graph Sampling for GNN, using GPU. Build and use alias table for random search, especially.

cuda-programming gnn graph-sample alias-table

C Updated Mar 4, 2024
xxyux Public

Updated Mar 4, 2024
Attention Public

This is my GPU course final project in MICS600J. The main content is my attempt to handwrite the attention process.

attention-mechanism cuda-programming

Cuda 2 Updated Mar 4, 2024
transformers Public
Forked from huggingface/transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Python Apache License 2.0 Updated Dec 13, 2023
TensorRT-LLM Public
Forked from NVIDIA/TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ Apache License 2.0 Updated Dec 11, 2023
GEMM Public

MICS600J - GPU Architectures and Programming, Homework 1

Cuda Updated Oct 29, 2023
ACM Public

ACM程序设计竞赛、Codeforces比赛、各种训练赛

C++ Updated May 9, 2023
template Public

我总结的ACM模版，实时更新～

C++ Updated Aug 17, 2022
kob Public

一个基于SpringBoot框架的人机对战平台

Java Updated Aug 11, 2022
acwing Public

acwing周赛的一些题目

Makefile Updated Aug 7, 2022
Skywalker Public
Forked from wpybtw/Skywalker

Cuda Updated Jun 29, 2022
Jerry_Yu Public

Something need to be noted

Updated Jan 14, 2022
AmpereSparseMatmul Public
Forked from lenLRX/AmpereSparseMatmul

study of Ampere' Sparse tensor core Matmul

Cuda MIT License Updated Jan 10, 2021

Xiangrui Yu xxyux

Achievements

Achievements

PaddleFleet Public

Uh oh!

PaddleFormers Public

Uh oh!

Paddle Public

Uh oh!

flash-attention Public

Uh oh!

ZipServ Public

Uh oh!

test_flashmask Public

Uh oh!

SpInfer Public

Uh oh!

Fine-tuning-LLM-with-2-4-sparse Public

Uh oh!

Distributed-SpMV Public

Uh oh!

cuAlias Public

Uh oh!

xxyux Public

Uh oh!

Attention Public

Uh oh!

transformers Public

Uh oh!

TensorRT-LLM Public

Uh oh!

GEMM Public

Uh oh!

ACM Public

Uh oh!

template Public

Uh oh!

kob Public

Uh oh!

acwing Public

Uh oh!

Skywalker Public

Uh oh!

Jerry_Yu Public

Uh oh!

AmpereSparseMatmul Public

Uh oh!