Skip to content
View tfruan2000's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report tfruan2000

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

The Triton TensorRT-LLM Backend

909 133 Updated Dec 19, 2025

A torch compile backend for multi-targets

Python 42 15 Updated Dec 19, 2025

Distributed Compiler based on Triton for Parallel Systems

Python 1,283 112 Updated Dec 16, 2025

LLMPerf is a library for validating and benchmarking LLMs

Python 1,067 198 Updated Dec 9, 2024

A Datacenter Scale Distributed Inference Serving Framework

Rust 5,655 749 Updated Dec 20, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 4,262 350 Updated Dec 19, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,818 1,033 Updated Dec 5, 2025

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 12,434 1,969 Updated Dec 20, 2025

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

C++ 12,488 2,293 Updated Dec 11, 2025

Goal: Enable awesome tooling for Bazel users of the C language family.

Python 875 168 Updated Aug 11, 2025

Tenstorrent MLIR compiler

C++ 224 86 Updated Dec 20, 2025

The book "Performance Analysis and Tuning on Modern CPU"

TeX 3,400 236 Updated Jun 9, 2025

[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Python 275 19 Updated May 1, 2025

Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs

Python 910 53 Updated Nov 27, 2025

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 2,001 160 Updated Dec 13, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 4,312 606 Updated Dec 20, 2025

MLIR-based partitioning system

MLIR 152 29 Updated Dec 19, 2025

🌟 Wiki of OI / ICPC for everyone. (某大型游戏线上攻略,内含炫酷算术魔法)

TypeScript 25,080 4,516 Updated Dec 19, 2025

BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

C++ 910 170 Updated Dec 30, 2024
MLIR 422 75 Updated Dec 19, 2025

A model compilation solution for various hardware

MLIR 457 52 Updated Aug 20, 2025

DeepSeek Coder: Let the Code Write Itself

Python 22,519 2,685 Updated Nov 11, 2025

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 4,848 328 Updated Nov 28, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,988 877 Updated Dec 4, 2025

Fork of Triton repository for OpenXLA uses of the Triton language and compiler

C++ 15 10 Updated Dec 19, 2025

A machine learning compiler for GPUs, CPUs, and ML accelerators

C++ 3,831 712 Updated Dec 20, 2025

FlagPerf is an open-source software platform for benchmarking AI chips.

Python 355 115 Updated Nov 11, 2025

📚 C/C++ 技术面试基础知识总结,包括语言、程序库、数据结构、算法、系统、网络、链接装载库等知识及面试经验、招聘、内推等信息。This repository is a summary of the basic knowledge of recruiting job seekers and beginners in the direction of C/C++ technology, in…

C++ 37,314 8,125 Updated Aug 24, 2025

Backward compatible ML compute opset inspired by HLO/MHLO

MLIR 583 168 Updated Dec 19, 2025
Next