TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 13,934 2,487 Updated Jun 22, 2026

Lyken17 / pytorch-OpCounter

Count the MACs / FLOPs of your PyTorch model.

Python 5,080 535 Updated Jul 8, 2024

ShqWW / dwconv2d

This is an efficient cuda implementation of 2D depthwise convolution for large kernel, it can be used in Pytorch deep learning framework.

Cuda 12 Updated Sep 28, 2023

gty111 / GEMM_MMA

Optimize GEMM with tensorcore step by step

37 8 Updated Dec 17, 2023

NVIDIA / cub

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

Cuda 1,834 462 Updated Oct 9, 2023

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 83,542 18,309 Updated Jun 22, 2026

MooreThreads / torch_musa

torch_musa is an open source repository based on PyTorch, which can make full use of the super computing power of MooreThreads graphics cards.

Python 499 36 Updated Mar 17, 2026

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 24,208 2,851 Updated Jun 20, 2026

shreyansh26 / Annotated-ML-Papers

Annotations of the interesting ML papers I read

286 28 Updated Jun 6, 2026

bytedance / ByteTransformer

optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052

C++ 478 36 Updated Mar 15, 2024

NVIDIA / FasterTransformer

Transformer related optimization, including BERT, GPT

C++ 6,428 935 Updated Mar 27, 2024

NVIDIA / cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,937 1,918 Updated Jun 21, 2026

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 19,496 2,952 Updated Jun 22, 2026

isocpp / CppCoreGuidelines

The C++ Core Guidelines are a set of tried-and-true guidelines, rules, and best practices about coding in C++

CSS 45,111 5,548 Updated Jun 15, 2026

yunjey / pytorch-tutorial

PyTorch Tutorial for Deep Learning Researchers

Python 32,392 8,241 Updated Aug 15, 2023

phlippe / uvadlc_notebooks

Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2023

Jupyter Notebook 3,163 681 Updated Jun 1, 2026

LiuXiaoxuanPKU / Cost-Model-papers

13 1 Updated Feb 22, 2023

NVIDIA / DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

Jupyter Notebook 14,820 3,407 Updated Aug 12, 2024

merrymercy / awesome-tensor-compilers

A list of awesome compiler projects and papers for tensor computation and deep learning.

2,760 326 Updated Oct 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lms-mt

Block or report lms-mt

Stars

fishaudio / Bert-VITS2

flashinfer-ai / flashinfer

bitsandbytes-foundation / bitsandbytes

MooreThreads / Moore-AnimateAnyone

meta-pytorch / gpt-fast

ggml-org / llama.cpp

kuleshov / minillm

LAION-AI / Open-Assistant

zugexiaodui / torch_flops

NVIDIA / TensorRT-LLM