Shalom1204

Shalom1204

8 followers · 3 following

Stars

OpenNMT / CTranslate2

Fast inference engine for Transformer models

C++ 4,199 434 Updated Dec 22, 2025

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 21,237 2,241 Updated Dec 22, 2025

graphdeco-inria / gaussian-splatting

Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"

Python 20,004 2,830 Updated Oct 17, 2025

LeiWang1999 / AutoGPTQ.tvm

GPTQ inference TVM kernel

Cuda 41 1 Updated Apr 25, 2024

NVIDIA-Merlin / HierarchicalKV

HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of HierarchicalKV is to store key-value feature-embeddings on h…

Cuda 185 31 Updated Nov 2, 2025

AlibabaResearch / flash-llm

Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity

Cuda 230 22 Updated Sep 24, 2023

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,824 1,035 Updated Dec 5, 2025

gpu-mode / lectures

Material for gpu-mode lectures

Jupyter Notebook 5,443 552 Updated Dec 8, 2025

jeremyfix / FFTConvolution

Some C++ codes for computing a 1D and 2D convolution product using the FFT implemented with the GSL or FFTW

C 60 17 Updated May 9, 2013

HazyResearch / flash-fft-conv

FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores

C++ 337 33 Updated Dec 28, 2024

fkodom / fft-conv-pytorch

Implementation of 1D, 2D, and 3D FFT convolutions in PyTorch. Much faster than direct convolutions for large kernel sizes.

Python 514 62 Updated Sep 28, 2023

srush / GPU-Puzzles

Solve puzzles. Learn CUDA.

Jupyter Notebook 11,841 909 Updated Sep 1, 2024

Alexey-T / CudaText

Cross-platform text editor, written in Free Pascal

Python 2,887 188 Updated Dec 20, 2025

NVlabs / tiny-cuda-nn

Lightning fast C++/CUDA neural network framework

C++ 4,359 536 Updated Dec 14, 2025

NVlabs / instant-ngp

Instant neural graphics primitives: lightning fast NeRF and more

Cuda 17,151 2,039 Updated Dec 14, 2025

nerfstudio-project / gsplat

CUDA accelerated rasterization of gaussian splatting

Cuda 4,180 641 Updated Nov 18, 2025

thu-ml / SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 2,887 291 Updated Dec 22, 2025

CoffeeBeforeArch / cuda_programming

Code from the "CUDA Crash Course" YouTube series by CoffeeBeforeArch

Cuda 907 177 Updated Jul 19, 2023

Infatoshi / cuda-course

Cuda 2,376 464 Updated Nov 3, 2025

rapidsai / cugraph

cuGraph - RAPIDS Graph Analytics Library

Cuda 2,092 342 Updated Dec 19, 2025

brucefan1983 / CUDA-Programming

Sample codes for my CUDA programming book

Cuda 1,955 378 Updated Dec 14, 2025

NVIDIA / cub

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

Cuda 1,809 464 Updated Oct 9, 2023

PacktPublishing / Learn-CUDA-Programming

Learn CUDA Programming, published by Packt

Cuda 1,217 262 Updated Dec 30, 2023

siboehm / SGEMM_CUDA

Fast CUDA matrix multiplication from scratch

Cuda 983 148 Updated Sep 2, 2025

rapidsai / raft

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing …

Cuda 963 220 Updated Dec 20, 2025

akrizhevsky / cuda-convnet2

Automatically exported from code.google.com/p/cuda-convnet2

Cuda 816 293 Updated Dec 3, 2015

Tongkaio / CUDA_Kernel_Samples

CUDA 算子手撕与面试指南

Cuda 736 81 Updated Aug 23, 2025

NVIDIA / nvbench

CUDA Kernel Benchmarking Library

Cuda 781 97 Updated Dec 10, 2025

rapidsai / cuspatial

CUDA-accelerated GIS and spatiotemporal algorithms

Cuda 695 163 Updated Jul 28, 2025

brucefan1983 / GPUMD

Graphics Processing Units Molecular Dynamics

Cuda 700 164 Updated Dec 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly