tfruan2000

Follow

🎯

Focusing

tfruan tfruan2000

🎯

Focusing

Follow

[mail]: ruantingfeng8@gmail.com [zhihu]： https://www.zhihu.com/people/ruan-ting-feng-59/columns

51 followers · 39 following

Institute of Computing Technology, CAS
https://tfruan2000.github.io/

Achievements

Achievements

Lists (1)

Sort

my work

Starred repositories

triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend

909 133 Updated Dec 19, 2025

XPU-Forces / xpu_graph

A torch compile backend for multi-targets

Python 42 15 Updated Dec 19, 2025

ByteDance-Seed / Triton-distributed

Distributed Compiler based on Triton for Parallel Systems

Python 1,283 112 Updated Dec 16, 2025

ray-project / llmperf

LLMPerf is a library for validating and benchmarking LLMs

Python 1,067 198 Updated Dec 9, 2024

ai-dynamo / dynamo

A Datacenter Scale Distributed Inference Serving Framework

Rust 5,655 749 Updated Dec 20, 2025

tile-ai / tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 4,262 350 Updated Dec 19, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,818 1,033 Updated Dec 5, 2025

NVIDIA / TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 12,434 1,969 Updated Dec 20, 2025

NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

C++ 12,488 2,293 Updated Dec 11, 2025

hedronvision / bazel-compile-commands-extractor

Goal: Enable awesome tooling for Bazel users of the C language family.

Python 875 168 Updated Aug 11, 2025

tenstorrent / tt-mlir

Tenstorrent MLIR compiler

C++ 224 86 Updated Dec 20, 2025

dendibakh / perf-book

The book "Performance Analysis and Tuning on Modern CPU"

TeX 3,400 236 Updated Jun 9, 2025

ByteDance-Seed / ShadowKV

[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Python 275 19 Updated May 1, 2025

volcengine / veScale

Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs

Python 910 53 Updated Nov 27, 2025

Deep-Learning-Profiling-Tools / triton-viz

Python 263 23 Updated Dec 19, 2025

mirage-project / mirage

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 2,001 160 Updated Dec 13, 2025

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 4,312 606 Updated Dec 20, 2025

openxla / shardy

MLIR-based partitioning system

MLIR 152 29 Updated Dec 19, 2025

OI-wiki / OI-wiki

🌟 Wiki of OI / ICPC for everyone. （某大型游戏线上攻略，内含炫酷算术魔法）

TypeScript 25,080 4,516 Updated Dec 19, 2025

alibaba / BladeDISC

BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

C++ 910 170 Updated Dec 30, 2024

tensorflow / mlir-hlo

MLIR 422 75 Updated Dec 19, 2025

bytedance / byteir

A model compilation solution for various hardware

MLIR 457 52 Updated Aug 20, 2025

deepseek-ai / DeepSeek-Coder

DeepSeek Coder: Let the Code Write Itself

Python 22,519 2,685 Updated Nov 11, 2025

xlite-dev / Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 4,848 328 Updated Nov 28, 2025

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,988 877 Updated Dec 4, 2025

openxla / triton

Forked from triton-lang/triton

Fork of Triton repository for OpenXLA uses of the Triton language and compiler

C++ 15 10 Updated Dec 19, 2025

openxla / xla

A machine learning compiler for GPUs, CPUs, and ML accelerators

C++ 3,831 712 Updated Dec 20, 2025

flagos-ai / FlagPerf

FlagPerf is an open-source software platform for benchmarking AI chips.

Python 355 115 Updated Nov 11, 2025

huihut / interview

📚 C/C++ 技术面试基础知识总结，包括语言、程序库、数据结构、算法、系统、网络、链接装载库等知识及面试经验、招聘、内推等信息。This repository is a summary of the basic knowledge of recruiting job seekers and beginners in the direction of C/C++ technology, in…

C++ 37,314 8,125 Updated Aug 24, 2025

openxla / stablehlo

Backward compatible ML compute opset inspired by HLO/MHLO

MLIR 583 168 Updated Dec 19, 2025

Starred topics

HTTP