Skip to content
View tfruan2000's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report tfruan2000

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

82 results for source starred repositories
Clear filter

An experimental implementation of compiler-driven automatic sharding of models across a given device mesh.

Python 52 15 Updated Feb 10, 2026

The Triton TensorRT-LLM Backend

918 135 Updated Feb 10, 2026

A torch compile backend for multi-targets

Python 45 19 Updated Feb 9, 2026

Distributed Compiler based on Triton for Parallel Systems

Python 1,350 127 Updated Feb 9, 2026

A Datacenter Scale Distributed Inference Serving Framework

Rust 6,074 850 Updated Feb 11, 2026

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python 5,142 442 Updated Feb 10, 2026

DeepEP: an efficient expert-parallel communication library

Cuda 8,973 1,091 Updated Feb 9, 2026

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 12,852 2,097 Updated Feb 10, 2026

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

C++ 12,686 2,317 Updated Feb 9, 2026

Goal: Enable awesome tooling for Bazel users of the C language family.

Python 888 178 Updated Aug 11, 2025

Tenstorrent MLIR compiler

MLIR 248 107 Updated Feb 11, 2026

The book "Performance Analysis and Tuning on Modern CPU"

TeX 3,468 239 Updated Jun 9, 2025

[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Python 283 21 Updated May 1, 2025

Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs

Python 928 55 Updated Nov 27, 2025

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 2,126 172 Updated Feb 10, 2026

FlashInfer: Kernel Library for LLM Serving

Python 4,943 703 Updated Feb 11, 2026

MLIR-based partitioning system

MLIR 164 32 Updated Feb 10, 2026

🌟 Wiki of OI / ICPC for everyone. (某大型游戏线上攻略,内含炫酷算术魔法)

TypeScript 25,459 4,565 Updated Feb 10, 2026

BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

C++ 916 170 Updated Dec 30, 2024
MLIR 422 75 Updated Jan 4, 2026

A model compilation solution for various hardware

MLIR 464 53 Updated Aug 20, 2025

DeepSeek Coder: Let the Code Write Itself

Python 22,767 2,733 Updated Nov 11, 2025

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 4,985 339 Updated Jan 18, 2026

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 9,641 953 Updated Feb 10, 2026

A machine learning compiler for GPUs, CPUs, and ML accelerators

C++ 3,975 744 Updated Feb 11, 2026

FlagPerf is an open-source software platform for benchmarking AI chips.

Python 361 117 Updated Nov 11, 2025

📚 C/C++ 技术面试基础知识总结,包括语言、程序库、数据结构、算法、系统、网络、链接装载库等知识及面试经验、招聘、内推等信息。This repository is a summary of the basic knowledge of recruiting job seekers and beginners in the direction of C/C++ technology, in…

C++ 37,498 8,130 Updated Aug 24, 2025

Backward compatible ML compute opset inspired by HLO/MHLO

MLIR 605 175 Updated Feb 6, 2026

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,711 553 Updated Feb 10, 2026
Next