huangzhilin-hzl

Julian Huang huangzhilin-hzl

4 followers · 29 following

Achievements

julian-lab-notebook Public

HTML Updated Jun 11, 2026
MSA Public
Forked from MiniMax-AI/MSA

Python MIT License Updated Jun 11, 2026
hpc-ops Public
Forked from Tencent/hpc-ops

High Performance LLM Inference Operator Library

C++ Other Updated Jun 11, 2026
OSCAR Public
Forked from FutureMLS-Lab/OSCAR

OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization

Python Updated Jun 3, 2026
mKernel Public
Forked from uccl-project/mKernel

mKernel: fast multi-node, multi-GPU fused kernels

Cuda MIT License Updated Jun 2, 2026
uccl Public
Forked from uccl-project/uccl

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ Apache License 2.0 Updated Jun 1, 2026
slime Public
Forked from THUDM/slime

slime is an LLM post-training framework for RL Scaling.

Python Apache License 2.0 Updated May 30, 2026
miles Public
Forked from radixark/miles

Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.

Python Apache License 2.0 Updated May 27, 2026
AI-Infra-Auto-Driven-SKILLS Public
Forked from BBuf/AI-Infra-Auto-Driven-SKILLS

Python Updated May 26, 2026
RLinf Public
Forked from RLinf/RLinf

RLinf: Reinforcement Learning Infrastructure for Embodied and Agentic AI

Python Apache License 2.0 Updated May 26, 2026
AReaL Public
Forked from areal-project/AReaL

The RL Bridge for LLM-based Agent Applications. Made Simple & Flexible.

Python Apache License 2.0 Updated May 26, 2026
humming Public
Forked from inclusionAI/humming

Python Apache License 2.0 Updated May 21, 2026
dynamo Public
Forked from ai-dynamo/dynamo

A Datacenter Scale Distributed Inference Serving Framework

Rust Other Updated May 21, 2026
flashinfer Public
Forked from flashinfer-ai/flashinfer

FlashInfer: Kernel Library for LLM Serving

Python Apache License 2.0 Updated May 19, 2026
sglang Public
Forked from sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

Python Apache License 2.0 Updated May 19, 2026
vllm-omni Public
Forked from vllm-project/vllm-omni

A framework for efficient model inference with omni-modality models

Python Apache License 2.0 Updated May 13, 2026
InferenceX Public
Forked from SemiAnalysisAI/InferenceX

Open Source Continuous Inference Benchmarking Qwen3.5, DeepSeek, GPTOSS - GB200 NVL72 vs MI355X vs B200 vs GB300 NVL72 vs H100 & soon™ TPUv6e/v7/Trainium2/3

Python Apache License 2.0 Updated May 12, 2026
mlx Public
Forked from ml-explore/mlx

MLX: An array framework for Apple silicon

C++ MIT License Updated May 9, 2026
vllm Public
Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python Apache License 2.0 Updated May 8, 2026
ds4 Public
Forked from antirez/ds4

DeepSeek 4 Flash local inference engine for Metal

C MIT License Updated May 8, 2026
tokenspeed Public
Forked from lightseekorg/tokenspeed

TokenSpeed is a speed-of-light LLM inference engine.

Python MIT License Updated May 7, 2026
lucebox-hub Public
Forked from Luce-Org/lucebox-hub

Lucebox optimization hub: hand-tuned LLM inference, built for specific consumer hardware.

C++ MIT License Updated May 3, 2026
quant_kernel_benchmarks Public
Forked from neuralmagic/quant_kernel_benchmarks

Benchmarking code for running quantized kernels from vLLM and other libraries

Python Updated May 2, 2026
marlin Public
Forked from IST-DASLab/marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python Apache License 2.0 Updated May 2, 2026
FlashQLA Public
Forked from QwenLM/FlashQLA

high-performance linear attention kernel library built on TileLang

Python MIT License Updated Apr 29, 2026
le-wm Public
Forked from lucas-maes/le-wm

Official code base for LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

Python MIT License Updated Apr 27, 2026
DeepSeek-V4-opt Public

Python Updated Apr 27, 2026
tilelang Public
Forked from tile-ai/tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python Other Updated Apr 26, 2026
dflash Public
Forked from z-lab/dflash

DFlash: Block Diffusion for Flash Speculative Decoding

Python MIT License Updated Apr 26, 2026
SpecForge Public
Forked from sgl-project/SpecForge

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python MIT License Updated Apr 23, 2026

Julian Huang huangzhilin-hzl

Achievements

Achievements

julian-lab-notebook Public

Uh oh!

MSA Public

Uh oh!

hpc-ops Public

Uh oh!

OSCAR Public

Uh oh!

mKernel Public

Uh oh!

uccl Public

Uh oh!

slime Public

Uh oh!

miles Public

Uh oh!

AI-Infra-Auto-Driven-SKILLS Public

Uh oh!

RLinf Public

Uh oh!

AReaL Public

Uh oh!

humming Public

Uh oh!

dynamo Public

Uh oh!

flashinfer Public

Uh oh!

sglang Public

Uh oh!

vllm-omni Public

Uh oh!

InferenceX Public

Uh oh!

mlx Public

Uh oh!

vllm Public

Uh oh!

ds4 Public

Uh oh!

tokenspeed Public

Uh oh!

lucebox-hub Public

Uh oh!

quant_kernel_benchmarks Public

Uh oh!

marlin Public

Uh oh!

FlashQLA Public

Uh oh!

le-wm Public

Uh oh!

DeepSeek-V4-opt Public

Uh oh!

tilelang Public

Uh oh!

dflash Public

Uh oh!

SpecForge Public

Uh oh!