[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 2,627 258 Updated Nov 6, 2025

rapidsai / cugraph

cuGraph - RAPIDS Graph Analytics Library

Cuda 2,078 338 Updated Nov 6, 2025

CannyLab / tsne-cuda

GPU Accelerated t-SNE for CUDA with Python bindings

Cuda 1,893 136 Updated Oct 2, 2024

clu0 / unet.cu

UNet diffusion model in pure CUDA

Cuda 651 31 Updated Jun 28, 2024

Dao-AILab / causal-conv1d

Causal depthwise conv1d in CUDA, with a PyTorch interface

Cuda 637 133 Updated Oct 20, 2025

XuezheMax / megalodon

Reference implementation of Megalodon 7B model

Cuda 523 54 Updated May 17, 2025

66RING / tiny-flash-attention

flash attention tutorial written in python, triton, cuda, cutlass

Cuda 442 47 Updated May 14, 2025

mit-han-lab / Quest

[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Cuda 344 37 Updated Jul 10, 2025

pyscf / gpu4pyscf

A plugin to use Nvidia GPU in PySCF package

Cuda 227 43 Updated Nov 6, 2025

lucidrains / flash-cosine-sim-attention

Implementation of fused cosine similarity attention in the same style as Flash Attention

Cuda 217 12 Updated Feb 13, 2023

jundaf2 / INT8-Flash-Attention-FMHA-Quantization

Cuda 158 16 Updated Sep 15, 2023

BorealisAI / neuzip

Official repository for the paper "NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks". This repository contains the code for the experiments in the paper.

Cuda 59 2 Updated Oct 31, 2024

abhisheknair10 / llama3.cu

Lightweight Llama 3 8B Inference Engine in CUDA C

Cuda 48 7 Updated Mar 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SeshurajuP seshurajup

Block or report seshurajup

Lists (2)

Information Retrieval

Tuning

Stars

karpathy / llm.c

deepseek-ai / DeepEP

deepseek-ai / DeepGEMM

flashinfer-ai / flashinfer

nerfstudio-project / gsplat

thu-ml / SageAttention