TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 13,933 2,484 Updated Jun 22, 2026

punica-ai / punica

Serving multiple LoRA finetuned LLM as one

Python 1,163 63 Updated May 8, 2024

scv119 / punica

Forked from punica-ai/punica

Cuda 1 Updated Sep 16, 2023

feifeibear / LLMSpeculativeSampling

Fast inference from large lauguage models via speculative decoding

Python 916 96 Updated Aug 22, 2024

ModelTC / LightLLM

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 4,134 334 Updated Jun 22, 2026

tpoisonooo / how-to-optimize-gemm

row-major matmul optimization

C++ 733 94 Updated May 14, 2026

YulhwaKim / cutlass_tilesparse

CUDA templates for tile-sparse matrix multiplication based on CUTLASS.

C++ 52 4 Updated Mar 1, 2018

ayaka14732 / llama-2-jax

JAX implementation of the Llama 2 model

Python 217 24 Updated Feb 2, 2024

checkpoint-restore / criu

Checkpoint/Restore tool

C 3,879 749 Updated Jun 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chen Shen scv119

Achievements

Achievements

Block or report scv119

Stars

microsoft / mscclpp

SzymonOzog / Penny

openai / codex

tinygrad / tinygrad

EricLBuehler / mistral.rs

karpathy / llm.c

openmlsys / openmlsys

nrc / r4cppp

xai-org / grok-1

MathFoundationRL / Book-Mathematical-Foundation-of-Reinforcement-Learning

google-deepmind / alphageometry

sgl-project / sglang

ColfaxResearch / cutlass-kernels

MARD1NO / CUDA-PPT

BBuf / how-to-optim-algorithm-in-cuda

ray-project / llmperf-leaderboard

kenjihiranabe / The-Art-of-Linear-Algebra

meta-pytorch / gpt-fast

vectorch-ai / ScaleLLM

flashinfer-ai / flashinfer

merrymercy / awesome-tensor-compilers

NVIDIA / TensorRT-LLM