Skip to content
View seshurajup's full-sized avatar

Block or report seshurajup

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
18 results for source starred repositories written in Cuda
Clear filter

LLM training in simple, raw C/CUDA

Cuda 28,085 3,265 Updated Jun 26, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,696 973 Updated Nov 6, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,863 737 Updated Oct 15, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 4,021 558 Updated Nov 6, 2025

CUDA accelerated rasterization of gaussian splatting

Cuda 3,904 591 Updated Oct 2, 2025

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 2,627 258 Updated Nov 6, 2025

cuGraph - RAPIDS Graph Analytics Library

Cuda 2,078 338 Updated Nov 6, 2025

GPU Accelerated t-SNE for CUDA with Python bindings

Cuda 1,893 136 Updated Oct 2, 2024

UNet diffusion model in pure CUDA

Cuda 651 31 Updated Jun 28, 2024

Causal depthwise conv1d in CUDA, with a PyTorch interface

Cuda 637 133 Updated Oct 20, 2025

Reference implementation of Megalodon 7B model

Cuda 523 54 Updated May 17, 2025

flash attention tutorial written in python, triton, cuda, cutlass

Cuda 442 47 Updated May 14, 2025

[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Cuda 344 37 Updated Jul 10, 2025

A plugin to use Nvidia GPU in PySCF package

Cuda 227 43 Updated Nov 6, 2025

Implementation of fused cosine similarity attention in the same style as Flash Attention

Cuda 217 12 Updated Feb 13, 2023

Official repository for the paper "NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks". This repository contains the code for the experiments in the paper.

Cuda 59 2 Updated Oct 31, 2024

Lightweight Llama 3 8B Inference Engine in CUDA C

Cuda 48 7 Updated Mar 21, 2025