TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 12,852 2,097 Updated Feb 10, 2026

NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

C++ 12,686 2,317 Updated Feb 9, 2026

hedronvision / bazel-compile-commands-extractor

Goal: Enable awesome tooling for Bazel users of the C language family.

Python 888 178 Updated Aug 11, 2025

tenstorrent / tt-mlir

Tenstorrent MLIR compiler

MLIR 248 107 Updated Feb 11, 2026

dendibakh / perf-book

The book "Performance Analysis and Tuning on Modern CPU"

TeX 3,468 239 Updated Jun 9, 2025

ByteDance-Seed / ShadowKV

[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Python 283 21 Updated May 1, 2025

volcengine / veScale

Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs

Python 928 55 Updated Nov 27, 2025

Deep-Learning-Profiling-Tools / triton-viz

Python 287 24 Updated Feb 10, 2026

mirage-project / mirage

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 2,126 172 Updated Feb 10, 2026

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Python 4,943 703 Updated Feb 11, 2026

openxla / shardy

MLIR-based partitioning system

MLIR 164 32 Updated Feb 10, 2026

OI-wiki / OI-wiki

🌟 Wiki of OI / ICPC for everyone. （某大型游戏线上攻略，内含炫酷算术魔法）

TypeScript 25,459 4,565 Updated Feb 10, 2026

alibaba / BladeDISC

BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

C++ 916 170 Updated Dec 30, 2024

tensorflow / mlir-hlo

MLIR 422 75 Updated Jan 4, 2026

bytedance / byteir

A model compilation solution for various hardware

MLIR 464 53 Updated Aug 20, 2025

deepseek-ai / DeepSeek-Coder

DeepSeek Coder: Let the Code Write Itself

Python 22,767 2,733 Updated Nov 11, 2025

xlite-dev / Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 4,985 339 Updated Jan 18, 2026

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 9,641 953 Updated Feb 10, 2026

openxla / xla

A machine learning compiler for GPUs, CPUs, and ML accelerators

C++ 3,975 744 Updated Feb 11, 2026

flagos-ai / FlagPerf

FlagPerf is an open-source software platform for benchmarking AI chips.

Python 361 117 Updated Nov 11, 2025

huihut / interview

📚 C/C++ 技术面试基础知识总结，包括语言、程序库、数据结构、算法、系统、网络、链接装载库等知识及面试经验、招聘、内推等信息。This repository is a summary of the basic knowledge of recruiting job seekers and beginners in the direction of C/C++ technology, in…

C++ 37,498 8,130 Updated Aug 24, 2025

openxla / stablehlo

Backward compatible ML compute opset inspired by HLO/MHLO

MLIR 605 175 Updated Feb 6, 2026

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,711 553 Updated Feb 10, 2026

tfruan tfruan2000

Lists (1)

my work

Starred repositories

HTTP