Skip to content
View isll's full-sized avatar
  • TsingTao

Block or report isll

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

FlyDSL is the Python front‑end of the project: Flexible LaYout DSL.

Python 156 42 Updated Apr 19, 2026

✔(已完结)超级全面的 深度学习 笔记【土堆 Pytorch】【李沐 动手学深度学习】【吴恩达 深度学习】【大飞 大模型Agent】

Jupyter Notebook 20,011 2,301 Updated Apr 18, 2026

Speed of Light Analysis for ML Model Runtime

Python 51 7 Updated Apr 13, 2026
C++ 52 1 Updated Apr 13, 2026

A lightweight triton-based General Matrix Multiplication (GEMM) library.

Python 58 12 Updated Apr 18, 2026

CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…

C++ 940 78 Updated Apr 1, 2026

AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming

Python 183 37 Updated Apr 18, 2026

Asterinas aims to be a production-grade Linux alternative—memory safe, high-performance, and more.

Rust 4,428 293 Updated Apr 18, 2026

triton for dsa

Python 60 8 Updated Apr 14, 2026

Universal LLM Deployment Engine with ML Compilation

Python 22,486 2,010 Updated Apr 14, 2026

Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.

Go 169,395 15,681 Updated Apr 19, 2026

FlagTree is a unified compiler supporting multiple AI chip backends for custom Deep Learning operations, which is forked from triton-lang/triton.

C++ 243 54 Updated Apr 17, 2026
C++ 362 40 Updated Jan 28, 2026

Allo Accelerator Design and Programming Framework (PLDI'24)

Python 370 68 Updated Mar 13, 2026

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 1,308 137 Updated Apr 18, 2026

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

Cuda 2,204 196 Updated Apr 19, 2026

slime is an LLM post-training framework for RL Scaling.

Python 5,367 733 Updated Apr 18, 2026

NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs

C++ 708 91 Updated Mar 30, 2026

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

C++ 1,368 183 Updated Mar 12, 2026

Distributed Compiler based on Triton for Parallel Systems

Python 1,411 138 Updated Apr 17, 2026

My learning notes for ML SYS.

Python 6,048 396 Updated Apr 8, 2026

Perplexity GPU Kernels

C++ 567 86 Updated Nov 7, 2025

The RL Bridge for LLM-based Agent Applications. Made Simple & Flexible.

Python 5,060 470 Updated Apr 19, 2026

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 5,129 683 Updated Apr 19, 2026

Tile primitives for speedy kernels

Cuda 3,323 277 Updated Apr 8, 2026

TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.

Cuda 106 6 Updated Jun 28, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 20,786 3,691 Updated Apr 17, 2026

kro | Kube Resource Orchestrator

Go 2,855 339 Updated Apr 16, 2026

A Datacenter Scale Distributed Inference Serving Framework

Rust 6,590 1,036 Updated Apr 19, 2026

Staging repo for development of native port of TypeScript

Go 24,817 896 Updated Apr 18, 2026
Next