isll

Follow

llei isll

Follow

c、c++、distributed system、network

39 followers · 123 following

TsingTao

Lists (1)

Sort

🚀 My stack

Stars

ROCm / FlyDSL

FlyDSL is the Python front‑end of the project: Flexible LaYout DSL.

Python 171 48 Updated Apr 28, 2026

AccumulateMore / CV

✔（已完结）超级全面的深度学习笔记【土堆 Pytorch】【李沐动手学深度学习】【吴恩达深度学习】【大飞大模型Agent】

Jupyter Notebook 20,415 2,349 Updated Apr 27, 2026

NVlabs / SOLAR

Speed of Light Analysis for ML Model Runtime

Python 57 9 Updated Apr 13, 2026

ROCm / hipThreads

C++ 52 1 Updated Apr 13, 2026

ROCm / tritonBLAS

A lightweight triton-based General Matrix Multiplication (GEMM) library.

Python 60 13 Updated Apr 22, 2026

NVIDIA / cuda-tile

CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…

C++ 947 77 Updated Apr 1, 2026

ROCm / iris

AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming

Python 188 39 Updated Apr 28, 2026

asterinas / asterinas

Asterinas aims to be a production-grade Linux alternative—memory safe, high-performance, and more.

Rust 4,440 293 Updated Apr 27, 2026

DeepLink-org / DLCompiler

triton for dsa

Python 63 8 Updated Apr 14, 2026

mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation

Python 22,539 2,018 Updated Apr 22, 2026

ollama / ollama

Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.

Go 170,165 15,844 Updated Apr 28, 2026

flagos-ai / FlagTree

FlagTree is a unified compiler supporting multiple AI chip backends for custom Deep Learning operations, which is forked from triton-lang/triton.

C++ 250 58 Updated Apr 28, 2026

stepfun-ai / StepMesh

C++ 362 40 Updated Jan 28, 2026

cornell-zhang / allo

Allo Accelerator Design and Programming Framework (PLDI'24)

Python 373 68 Updated Mar 13, 2026

uccl-project / uccl

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 1,333 143 Updated Apr 27, 2026

mirage-project / mirage

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

Cuda 2,228 200 Updated Apr 27, 2026

THUDM / slime

slime is an LLM post-training framework for RL Scaling.

Python 5,499 753 Updated Apr 27, 2026

NVIDIA / DCGM

NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs

C++ 714 91 Updated Apr 21, 2026

NVIDIA / gdrcopy

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

C++ 1,373 185 Updated Mar 12, 2026

ByteDance-Seed / Triton-distributed

Distributed Compiler based on Triton for Parallel Systems

Python 1,419 139 Updated Apr 22, 2026

zhaochenyang20 / Awesome-ML-SYS-Tutorial

My learning notes for ML SYS.

Python 6,138 401 Updated Apr 23, 2026

perplexityai / pplx-kernels

Perplexity GPU Kernels

C++ 569 86 Updated Nov 7, 2025

inclusionAI / AReaL

The RL Bridge for LLM-based Agent Applications. Made Simple & Flexible.

Python 5,105 485 Updated Apr 27, 2026

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 5,210 708 Updated Apr 27, 2026

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 3,328 276 Updated Apr 25, 2026

microsoft / TileFusion

TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.

Cuda 108 6 Updated Jun 28, 2025

verl-project / verl

verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework

Python 20,969 3,760 Updated Apr 28, 2026

kubernetes-sigs / kro

kro | Kube Resource Orchestrator

Go 2,864 342 Updated Apr 27, 2026

ai-dynamo / dynamo

A Datacenter Scale Distributed Inference Serving Framework

Rust 6,675 1,066 Updated Apr 28, 2026

microsoft / typescript-go

Staging repo for development of native port of TypeScript

Go 25,230 929 Updated Apr 28, 2026