Skip to content
View Kyrie-Zhao's full-sized avatar
🎃
Focusing
🎃
Focusing

Block or report Kyrie-Zhao

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
15 stars written in Cuda
Clear filter

FlashInfer: Kernel Library for LLM Serving

Cuda 4,318 610 Updated Dec 22, 2025

how to optimize some algorithm in cuda.

Cuda 2,700 244 Updated Dec 21, 2025

Sample codes for my CUDA programming book

Cuda 1,954 378 Updated Dec 14, 2025

Fast CUDA matrix multiplication from scratch

Cuda 983 148 Updated Sep 2, 2025

[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Cuda 331 30 Updated Jul 2, 2024

The CUDA version of the RWKV language model ( https://github.com/BlinkDL/RWKV-LM )

Cuda 227 35 Updated Dec 10, 2025

REEF is a GPU-accelerated DNN inference serving system that enables instant kernel preemption and biased concurrent execution in GPU scheduling.

Cuda 102 11 Updated Dec 24, 2022

DietCode Code Release

Cuda 64 9 Updated Jul 21, 2022

Artifact for OSDI'23: MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Multi-GPU Platforms.

Cuda 41 5 Updated Mar 17, 2024

This is a repo for my training cuda code.

Cuda 9 Updated Oct 10, 2020

This is a repo for my CUDA learning.

Cuda 6 2 Updated Sep 15, 2022

Aaron: Compile-time Kernel Adaptation for Multi-DNN Inference Acceleration on Edge GPU [SenSys'22 Best Poster]

Cuda 2 Updated Mar 2, 2023

SW technique using Persistent Threads and SM Partitioning to enhance gpu resource utilization

Cuda 1 1 Updated Feb 6, 2020