[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 2,622 256 Updated Oct 28, 2025

ModelCloud / GPTQModel

LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

Python 857 124 Updated Nov 5, 2025

AutoGPTQ / AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Python 4,983 525 Updated Apr 11, 2025

JJXiangJiaoJun / cutlass_gemv

GEMV implementation with CUTLASS

C++ 15 1 Updated Aug 21, 2025

pku-liang / hlcd-spmm-project

Course Project for High Level Chip Design （高层次芯片设计）

C++ 16 6 Updated Jan 2, 2025

vortexgpgpu / vortex_tutorials

HTML 195 61 Updated Oct 29, 2025

deepseek-ai / DeepSeek-OCR

Contexts Optical Compression

Python 19,557 1,370 Updated Oct 25, 2025

jingyaogong / minimind

🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT！🌏 Train a 26M-parameter GPT from scratch in just 2h!

Python 32,499 3,758 Updated Nov 2, 2025

coderonion / awesome-cuda-and-hpc

🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.

397 39 Updated Aug 2, 2025

ucsc-vama / essent

high-performance RTL simulator

Scala 181 15 Updated Jun 19, 2024

InfiniTensor / llaisys

Let's Learn AI SYStem

Python 11 46 Updated Sep 4, 2025

bbruceyuan / AI-Interview-Code

LLM大模型（重点）以及搜广推等 AI 算法中手写的面试题，（非 LeetCode），比如 Self-Attention, AUC等，一般比 LeetCode 更考察一个人的综合能力，又更贴近业务和基础知识一点

Jupyter Notebook 429 22 Updated Dec 29, 2024

weiserlab / TinyLLM

Bringing Language Models to the Most Resource Constrained Devices

Python 42 7 Updated Dec 23, 2024

gogongxt / nano-sglang

Python 40 4 Updated Oct 21, 2025

openvinotoolkit / openvino

OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

C++ 9,159 2,795 Updated Nov 5, 2025

feifeibear / LLMSpeculativeSampling

Fast inference from large lauguage models via speculative decoding

Python 846 87 Updated Aug 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yifu Xu xyfgemini

Achievements

Achievements

Highlights

Block or report xyfgemini

Stars

Multi-V-VM / Codify

LLMServe / DistServe

bbruceyuan / LLMs-Zero-to-Hero

microsoft / sarathi-serve

HarryWu99 / llm_kvcache_sparsity

google-coral / coralnpu

zjnyly / TeraFly

adamgallas / llama-fpga

AlphaGPU / leetgpu-challenges

SafeAILab / EAGLE

clash-verge-rev / clash-verge-rev

ghimiredhikura / Awasome-Pruning

hrcheng1066 / awesome-pruning

datawhalechina / easy-rl

thu-ml / SageAttention