Skip to content
View isll's full-sized avatar
  • TsingTao

Block or report isll

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

FlyDSL is the Python front‑end of the project: Flexible LaYout DSL.

Python 171 48 Updated Apr 28, 2026

✔(已完结)超级全面的 深度学习 笔记【土堆 Pytorch】【李沐 动手学深度学习】【吴恩达 深度学习】【大飞 大模型Agent】

Jupyter Notebook 20,415 2,349 Updated Apr 27, 2026

Speed of Light Analysis for ML Model Runtime

Python 57 9 Updated Apr 13, 2026
C++ 52 1 Updated Apr 13, 2026

A lightweight triton-based General Matrix Multiplication (GEMM) library.

Python 60 13 Updated Apr 22, 2026

CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…

C++ 947 77 Updated Apr 1, 2026

AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming

Python 188 39 Updated Apr 28, 2026

Asterinas aims to be a production-grade Linux alternative—memory safe, high-performance, and more.

Rust 4,440 293 Updated Apr 27, 2026

triton for dsa

Python 63 8 Updated Apr 14, 2026

Universal LLM Deployment Engine with ML Compilation

Python 22,539 2,018 Updated Apr 22, 2026

Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.

Go 170,165 15,844 Updated Apr 28, 2026

FlagTree is a unified compiler supporting multiple AI chip backends for custom Deep Learning operations, which is forked from triton-lang/triton.

C++ 250 58 Updated Apr 28, 2026
C++ 362 40 Updated Jan 28, 2026

Allo Accelerator Design and Programming Framework (PLDI'24)

Python 373 68 Updated Mar 13, 2026

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 1,333 143 Updated Apr 27, 2026

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

Cuda 2,228 200 Updated Apr 27, 2026

slime is an LLM post-training framework for RL Scaling.

Python 5,499 753 Updated Apr 27, 2026

NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs

C++ 714 91 Updated Apr 21, 2026

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

C++ 1,373 185 Updated Mar 12, 2026

Distributed Compiler based on Triton for Parallel Systems

Python 1,419 139 Updated Apr 22, 2026

My learning notes for ML SYS.

Python 6,138 401 Updated Apr 23, 2026

Perplexity GPU Kernels

C++ 569 86 Updated Nov 7, 2025

The RL Bridge for LLM-based Agent Applications. Made Simple & Flexible.

Python 5,105 485 Updated Apr 27, 2026

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 5,210 708 Updated Apr 27, 2026

Tile primitives for speedy kernels

Cuda 3,328 276 Updated Apr 25, 2026

TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.

Cuda 108 6 Updated Jun 28, 2025

verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework

Python 20,969 3,760 Updated Apr 28, 2026

kro | Kube Resource Orchestrator

Go 2,864 342 Updated Apr 27, 2026

A Datacenter Scale Distributed Inference Serving Framework

Rust 6,675 1,066 Updated Apr 28, 2026

Staging repo for development of native port of TypeScript

Go 25,230 929 Updated Apr 28, 2026
Next