huangzhilin-hzl

Julian Huang huangzhilin-hzl

4 followers · 29 following

Achievements

Stars

Tencent / hpc-ops

High Performance LLM Inference Operator Library

C++ 931 96 Updated Jun 11, 2026

MiniMax-AI / MSA

Python 214 16 Updated Jun 12, 2026

Dogacel / auto-gpu-kernel

Winner 🏆 (Agent-only) MLSys 2026 - FlashInfer AI Kernel Generation Contest for the DeepSeek Sparse Attention (DSA) track with an average speedup of 34.93x

Python 103 9 Updated Jun 10, 2026

FutureMLS-Lab / OSCAR

OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization

Python 502 72 Updated Jun 8, 2026

jianuo-huang / Domino

Official implementation of “Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding”.

Python 62 3 Updated Jun 10, 2026

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 9,724 1,283 Updated Jun 11, 2026

uccl-project / uccl

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 1,413 157 Updated Jun 12, 2026

THUDM / slime

slime is an LLM post-training framework for RL Scaling.

Python 6,103 892 Updated Jun 13, 2026

radixark / miles

Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.

Python 1,548 254 Updated Jun 13, 2026

BBuf / AI-Infra-Auto-Driven-SKILLS

Python 566 49 Updated Jun 8, 2026

RLinf / RLinf

RLinf: Reinforcement Learning Infrastructure for Embodied and Agentic AI

Python 3,769 526 Updated Jun 13, 2026

uccl-project / mKernel

mKernel: fast multi-node, multi-GPU fused kernels

Cuda 231 22 Updated Jun 8, 2026

areal-project / AReaL

The RL Bridge for LLM-based Agent Applications. Made Simple & Flexible.

Python 5,293 519 Updated Jun 12, 2026

ByteDance-Seed / Triton-distributed

Distributed Compiler based on Triton for Parallel Systems

Python 1,459 151 Updated Apr 22, 2026

NVIDIA / cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,888 1,905 Updated Jun 11, 2026

vllm-project / vllm-omni

A framework for efficient model inference with omni-modality models

Python 5,121 1,107 Updated Jun 13, 2026

SemiAnalysisAI / InferenceX

Open Source Continuous Inference Benchmark Research Platform Kimi K2.6, DeepSeekv4, GLM5 - GB200 NVL72 vs MI355X vs B200 vs GB300 NVL72 & soon™ TPUv6e/v7/Trainium2/3

Shell 1,091 193 Updated Jun 13, 2026