Skip to content
View isll's full-sized avatar
  • TsingTao

Block or report isll

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

FlyDSL is the Python front‑end of the project: Flexible LaYout DSL.

Python 153 42 Updated Apr 16, 2026

✔(已完结)超级全面的 深度学习 笔记【土堆 Pytorch】【李沐 动手学深度学习】【吴恩达 深度学习】【大飞 大模型Agent】

Jupyter Notebook 19,924 2,286 Updated Apr 2, 2026

Speed of Light Analysis for ML Model Runtime

Python 51 7 Updated Apr 13, 2026
C++ 52 1 Updated Apr 13, 2026

A lightweight triton-based General Matrix Multiplication (GEMM) library.

Python 58 13 Updated Apr 15, 2026

CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…

C++ 938 78 Updated Apr 1, 2026

AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming

Python 183 38 Updated Apr 16, 2026

Asterinas aims to be a production-grade Linux alternative—memory safe, high-performance, and more.

Rust 4,425 291 Updated Apr 16, 2026

triton for dsa

Python 60 8 Updated Apr 14, 2026

Universal LLM Deployment Engine with ML Compilation

Python 22,472 2,009 Updated Apr 14, 2026

Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.

Go 169,186 15,625 Updated Apr 16, 2026

FlagTree is a unified compiler supporting multiple AI chip backends for custom Deep Learning operations, which is forked from triton-lang/triton.

C++ 239 54 Updated Apr 16, 2026
C++ 360 40 Updated Jan 28, 2026

Allo Accelerator Design and Programming Framework (PLDI'24)

Python 370 68 Updated Mar 13, 2026

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 1,303 137 Updated Apr 16, 2026

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

Cuda 2,199 195 Updated Apr 16, 2026

slime is an LLM post-training framework for RL Scaling.

Python 5,336 728 Updated Apr 16, 2026

NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs

C++ 705 89 Updated Mar 30, 2026

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

C++ 1,368 183 Updated Mar 12, 2026

Distributed Compiler based on Triton for Parallel Systems

Python 1,406 138 Updated Apr 10, 2026

My learning notes for ML SYS.

Python 6,024 394 Updated Apr 8, 2026

Perplexity GPU Kernels

C++ 565 86 Updated Nov 7, 2025

The RL Bridge for LLM-based Agent Applications. Made Simple & Flexible.

Python 5,049 466 Updated Apr 16, 2026

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 5,109 681 Updated Apr 16, 2026

Tile primitives for speedy kernels

Cuda 3,316 278 Updated Apr 8, 2026

TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.

Cuda 106 6 Updated Jun 28, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 20,739 3,671 Updated Apr 16, 2026

kro | Kube Resource Orchestrator

Go 2,847 337 Updated Apr 16, 2026

A Datacenter Scale Distributed Inference Serving Framework

Rust 6,565 1,034 Updated Apr 16, 2026

Staging repo for development of native port of TypeScript

Go 24,796 893 Updated Apr 16, 2026
Next