Skip to content
View lms-mt's full-sized avatar

Block or report lms-mt

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

vits2 backbone with multilingual-bert

Python 8,761 1,292 Updated Jun 15, 2026

FlashInfer: Kernel Library for LLM Serving

Python 5,836 1,067 Updated Jun 22, 2026

Accessible large language models via k-bit quantization for PyTorch.

Python 8,281 873 Updated Jun 18, 2026

Character Animation (AnimateAnyone, Face Reenactment)

Python 3,508 296 Updated May 31, 2024

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 6,224 573 Updated Aug 22, 2025

LLM inference in C/C++

C++ 117,641 19,810 Updated Jun 22, 2026

MiniLLM is a minimal system for running modern LLMs on consumer-grade GPUs

Python 966 60 Updated May 15, 2023

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

Python 37,387 3,277 Updated Aug 17, 2024

A library for calculating the FLOPs in the forward() process based on torch.fx

Python 139 9 Updated Dec 23, 2025

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 13,934 2,487 Updated Jun 22, 2026

Count the MACs / FLOPs of your PyTorch model.

Python 5,080 535 Updated Jul 8, 2024

This is an efficient cuda implementation of 2D depthwise convolution for large kernel, it can be used in Pytorch deep learning framework.

Cuda 12 Updated Sep 28, 2023

Optimize GEMM with tensorcore step by step

37 8 Updated Dec 17, 2023

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

Cuda 1,834 462 Updated Oct 9, 2023

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 83,542 18,309 Updated Jun 22, 2026

torch_musa is an open source repository based on PyTorch, which can make full use of the super computing power of MooreThreads graphics cards.

Python 499 36 Updated Mar 17, 2026

Fast and memory-efficient exact attention

Python 24,208 2,851 Updated Jun 20, 2026

Annotations of the interesting ML papers I read

286 28 Updated Jun 6, 2026

optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052

C++ 478 36 Updated Mar 15, 2024

Transformer related optimization, including BERT, GPT

C++ 6,428 935 Updated Mar 27, 2024

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,937 1,918 Updated Jun 21, 2026

Development repository for the Triton language and compiler

MLIR 19,496 2,952 Updated Jun 22, 2026

The C++ Core Guidelines are a set of tried-and-true guidelines, rules, and best practices about coding in C++

CSS 45,111 5,548 Updated Jun 15, 2026

PyTorch Tutorial for Deep Learning Researchers

Python 32,392 8,241 Updated Aug 15, 2023

Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2023

Jupyter Notebook 3,163 681 Updated Jun 1, 2026

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

Jupyter Notebook 14,820 3,407 Updated Aug 12, 2024

A list of awesome compiler projects and papers for tensor computation and deep learning.

2,760 326 Updated Oct 19, 2024