-
Institute of Computing Technology, CAS
- https://tfruan2000.github.io/
Lists (1)
Sort Name ascending (A-Z)
Starred repositories
The Triton TensorRT-LLM Backend
Distributed Compiler based on Triton for Parallel Systems
LLMPerf is a library for validating and benchmarking LLMs
A Datacenter Scale Distributed Inference Serving Framework
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
DeepEP: an efficient expert-parallel communication library
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Goal: Enable awesome tooling for Bazel users of the C language family.
The book "Performance Analysis and Tuning on Modern CPU"
[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
FlashInfer: Kernel Library for LLM Serving
🌟 Wiki of OI / ICPC for everyone. (某大型游戏线上攻略,内含炫酷算术魔法)
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
A model compilation solution for various hardware
DeepSeek Coder: Let the Code Write Itself
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
openxla / triton
Forked from triton-lang/tritonFork of Triton repository for OpenXLA uses of the Triton language and compiler
A machine learning compiler for GPUs, CPUs, and ML accelerators
FlagPerf is an open-source software platform for benchmarking AI chips.
📚 C/C++ 技术面试基础知识总结,包括语言、程序库、数据结构、算法、系统、网络、链接装载库等知识及面试经验、招聘、内推等信息。This repository is a summary of the basic knowledge of recruiting job seekers and beginners in the direction of C/C++ technology, in…
Backward compatible ML compute opset inspired by HLO/MHLO