ZanePoe

ZanePoe

38 followers · 83 following

Lists (29)

Sort

Starred repositories

8 stars written in Cuda

Clear filter

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 4,019 558 Updated Nov 6, 2025

thu-ml / SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 2,622 258 Updated Oct 28, 2025

tspeterkim / flash-attention-minimal

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda 961 97 Updated Dec 30, 2024

thu-ml / SpargeAttn

[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.

Cuda 758 65 Updated Oct 31, 2025

usyd-fsalab / fp6_llm

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).

Cuda 270 22 Updated Jul 16, 2025

OpenBMB / CPM.cu

CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge techniques in sparse architecture, speculative sampling and qua…

Cuda 201 20 Updated Oct 10, 2025

sdbds / SageAttention-for-windows

Forked from thu-ml/SageAttention

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

ZanePoe

Lists (29)

AI

ai_tool

app

data

finance

fine-tuning

free_serve

front-lowcode

game

gfw

gis

hardware

java-frame

js

KB

lowcode

mcp

NLP

note

python

Robot

RPA

safe

shop

spider

vedio

vert.x

vertx

wp

Starred repositories

video-editor

browser-fingerprinting

vits

b3dm