airlsyn

airlsyn airlsyn

Mars

62 followers · 77 following

http://airlsyn.github.io/

Achievements

Stars

10 results for source starred repositories written in Cuda

Clear filter

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,694 972 Updated Nov 5, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,861 736 Updated Oct 15, 2025

baidu-research / warp-ctc

Fast parallel CTC.

Cuda 4,073 1,036 Updated Mar 4, 2024

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 4,018 558 Updated Nov 6, 2025

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 2,868 192 Updated Nov 4, 2025

NVIDIA / nccl-tests

NCCL Tests

Cuda 1,325 326 Updated Nov 3, 2025

k2-fsa / k2

FSA/FST algorithms, differentiable, with PyTorch compatibility.

Cuda 1,276 230 Updated Nov 4, 2025

usyd-fsalab / fp6_llm

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).

Cuda 270 22 Updated Jul 16, 2025

tgale96 / grouped_gemm

PyTorch bindings for CUTLASS grouped GEMM.

Cuda 126 77 Updated May 29, 2025

apryor6 / pipcudemo

An example project showing how to build a pip-installable Python package that invokes custom CUDA/C++ code

Cuda 14 2 Updated Jul 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

airlsyn airlsyn

Achievements

Achievements

Block or report airlsyn

Stars

deepseek-ai / DeepEP

deepseek-ai / DeepGEMM

baidu-research / warp-ctc

flashinfer-ai / flashinfer

HazyResearch / ThunderKittens

NVIDIA / nccl-tests

k2-fsa / k2

usyd-fsalab / fp6_llm

tgale96 / grouped_gemm

apryor6 / pipcudemo