Skip to content
View isll's full-sized avatar
  • TsingTao

Block or report isll

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

FlyDSL is the Python front‑end of the project: Flexible LaYout DSL.

Python 155 42 Updated Apr 17, 2026

✔(已完结)超级全面的 深度学习 笔记【土堆 Pytorch】【李沐 动手学深度学习】【吴恩达 深度学习】【大飞 大模型Agent】

Jupyter Notebook 19,964 2,296 Updated Apr 2, 2026

Speed of Light Analysis for ML Model Runtime

Python 51 7 Updated Apr 13, 2026
C++ 52 1 Updated Apr 13, 2026

A lightweight triton-based General Matrix Multiplication (GEMM) library.

Python 58 13 Updated Apr 16, 2026

CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…

C++ 940 78 Updated Apr 1, 2026

AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming

Python 183 37 Updated Apr 17, 2026

Asterinas aims to be a production-grade Linux alternative—memory safe, high-performance, and more.

Rust 4,426 292 Updated Apr 17, 2026

triton for dsa

Python 60 8 Updated Apr 14, 2026

Universal LLM Deployment Engine with ML Compilation

Python 22,480 2,010 Updated Apr 14, 2026

Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.

Go 169,269 15,649 Updated Apr 17, 2026

FlagTree is a unified compiler supporting multiple AI chip backends for custom Deep Learning operations, which is forked from triton-lang/triton.

C++ 242 54 Updated Apr 17, 2026
C++ 361 40 Updated Jan 28, 2026

Allo Accelerator Design and Programming Framework (PLDI'24)

Python 370 68 Updated Mar 13, 2026

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 1,305 137 Updated Apr 17, 2026

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

Cuda 2,200 195 Updated Apr 17, 2026

slime is an LLM post-training framework for RL Scaling.

Python 5,354 731 Updated Apr 17, 2026

NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs

C++ 707 90 Updated Mar 30, 2026

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

C++ 1,368 183 Updated Mar 12, 2026

Distributed Compiler based on Triton for Parallel Systems

Python 1,408 138 Updated Apr 17, 2026

My learning notes for ML SYS.

Python 6,035 395 Updated Apr 8, 2026

Perplexity GPU Kernels

C++ 567 86 Updated Nov 7, 2025

The RL Bridge for LLM-based Agent Applications. Made Simple & Flexible.

Python 5,053 466 Updated Apr 17, 2026

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 5,118 682 Updated Apr 17, 2026

Tile primitives for speedy kernels

Cuda 3,322 278 Updated Apr 8, 2026

TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.

Cuda 106 6 Updated Jun 28, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 20,765 3,684 Updated Apr 17, 2026

kro | Kube Resource Orchestrator

Go 2,851 338 Updated Apr 16, 2026

A Datacenter Scale Distributed Inference Serving Framework

Rust 6,575 1,035 Updated Apr 17, 2026

Staging repo for development of native port of TypeScript

Go 24,803 893 Updated Apr 17, 2026
Next