Skip to content
View huangzhilin-hzl's full-sized avatar

Block or report huangzhilin-hzl

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

High Performance LLM Inference Operator Library

C++ 931 96 Updated Jun 11, 2026
Python 214 16 Updated Jun 12, 2026

Winner 🏆 (Agent-only) MLSys 2026 - FlashInfer AI Kernel Generation Contest for the DeepSeek Sparse Attention (DSA) track with an average speedup of 34.93x

Python 103 9 Updated Jun 10, 2026

OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization

Python 502 72 Updated Jun 8, 2026

Official implementation of “Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding”.

Python 62 3 Updated Jun 10, 2026

DeepEP: an efficient expert-parallel communication library

Cuda 9,724 1,283 Updated Jun 11, 2026

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 1,413 157 Updated Jun 12, 2026

slime is an LLM post-training framework for RL Scaling.

Python 6,103 892 Updated Jun 13, 2026

Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.

Python 1,548 254 Updated Jun 13, 2026

RLinf: Reinforcement Learning Infrastructure for Embodied and Agentic AI

Python 3,769 526 Updated Jun 13, 2026

mKernel: fast multi-node, multi-GPU fused kernels

Cuda 231 22 Updated Jun 8, 2026

The RL Bridge for LLM-based Agent Applications. Made Simple & Flexible.

Python 5,293 519 Updated Jun 12, 2026

Distributed Compiler based on Triton for Parallel Systems

Python 1,459 151 Updated Apr 22, 2026

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,888 1,905 Updated Jun 11, 2026

A framework for efficient model inference with omni-modality models

Python 5,121 1,107 Updated Jun 13, 2026

Open Source Continuous Inference Benchmark Research Platform Kimi K2.6, DeepSeekv4, GLM5 - GB200 NVL72 vs MI355X vs B200 vs GB300 NVL72 & soon™ TPUv6e/v7/Trainium2/3

Shell 1,091 193 Updated Jun 13, 2026
Python 6 Updated May 12, 2026

Official code base for LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

Python 3,827 523 Updated May 26, 2026

Lightweight coding agent that runs in your terminal

Rust 90,723 13,376 Updated Jun 13, 2026

DeepSeek 4 Flash and PRO local inference engine for Metal, CUDA and ROCm

C 13,536 1,193 Updated Jun 11, 2026

Benchmark suite for LLMs from Fireworks.ai

Python 105 39 Updated Jun 11, 2026

TokenSpeed is a speed-of-light LLM inference engine.

Python 1,421 154 Updated Jun 13, 2026

Make Any Website into CLI & Use your logged-in browser by AI agent.

JavaScript 24,202 2,418 Updated Jun 12, 2026

Fast LLM speculative inference server for consumer hardware.

C++ 2,422 221 Updated Jun 12, 2026

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 1,085 88 Updated Sep 4, 2024

high-performance linear attention kernel library built on TileLang

Python 536 45 Updated May 7, 2026

Tensara's GPU programming problems

Python 20 7 Updated Apr 23, 2026

Mobile and Web client for Codex and Claude Code, with realtime voice, encryption and fully featured

TypeScript 21,848 1,823 Updated Jun 10, 2026
Python 135 18 Updated Jun 10, 2026
Next