lizhuo008

Follow

🔥

Focusing

Nolan Luo lizhuo008

🔥

Focusing

Follow

Committed to advancing efficiency in AI algorithms and systems.

9 followers · 33 following

Nanyang Technological University
Singapore

Highlights

Pro

Stars

sgl-project / SpecForge

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 577 125 Updated Dec 23, 2025

sgl-project / mini-sglang

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 2,310 201 Updated Dec 23, 2025

Infrasys-AI / AIInfra

AIInfra（AI 基础设施）指AI系统从底层芯片等硬件，到上层软件栈支持AI大模型训练和推理。

Jupyter Notebook 5,515 763 Updated Dec 22, 2025

czg1225 / dParallel

dParallel: Learnable Parallel Decoding for dLLMs

Python 51 1 Updated Oct 14, 2025

tile-ai / tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 4,299 359 Updated Dec 24, 2025

osayamenja / FlashMoE

Distributed MoE in a Single Kernel [NeurIPS '25]

Cuda 157 18 Updated Dec 23, 2025

GeeeekExplorer / nano-vllm

Nano vLLM

Python 10,105 1,267 Updated Nov 3, 2025

FoundationVision / InfinityStar

[NeurIPS 2025 Oral]Infinity⭐️: Uniﬁed Spacetime AutoRegressive Modeling for Visual Generation

Python 667 24 Updated Nov 27, 2025

Jianguo99 / Awesome-Diffusion-LLM

Forked from ML-GSAI/Diffusion-LLM-Papers

A Collection of Papers on Diffusion Large Language Models

40 2 Updated Dec 24, 2025

horseee / dKV-Cache

[NeurIPS'25] dKV-Cache: The Cache for Diffusion Language Models

Python 126 10 Updated May 22, 2025

ML-GSAI / LLaDA

Official PyTorch implementation for "Large Language Diffusion Models"

Python 3,427 230 Updated Nov 12, 2025

LLaVA-VL / LLaVA-NeXT

Python 4,466 434 Updated Sep 14, 2025

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 4,349 615 Updated Dec 25, 2025

CerebrasResearch / reap

REAP: Router-weighted Expert Activation Pruning for SMoE compression

Python 159 25 Updated Dec 9, 2025

mit-han-lab / Block-Sparse-Attention

A sparse attention kernel supporting mix sparse patterns

C++ 412 39 Updated Dec 16, 2025

chenghuailin / academic-cv-latex-template-easylight

TeX 3 1 Updated Mar 30, 2025

academicpages / academicpages.github.io

Github Pages template based upon HTML and Markdown for personal, portfolio-based websites.

SCSS 16,127 4,650 Updated Dec 21, 2025

karpathy / nanochat

The best ChatGPT that $100 can buy.

Python 39,214 4,966 Updated Dec 23, 2025

Cobalt-27 / CompactFusion

[NeurIPS 2025] Accelerating Parallel Diffusion Model Serving with Residual Compression

Python 39 1 Updated Oct 17, 2025

zhuzilin / ring-flash-attention

Ring attention implementation with flash attention

Python 953 91 Updated Sep 10, 2025

fla-org / flash-linear-attention

🚀 Efficient implementations of state-of-the-art linear attention models

Python 4,117 338 Updated Dec 24, 2025

Starmys / TritonStudyGroup

Python 111 8 Updated Sep 22, 2025

EkaterinaXie / LiteVAR

code for paper LiteVAR

Jupyter Notebook 6 Updated Nov 28, 2024

FoundationVision / Infinity

[CVPR 2025 Oral]Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

Python 1,531 84 Updated Nov 10, 2025

ahydchh / Impromptu-VLA

Python 361 25 Updated Oct 29, 2025

pkunlp-icler / FastV

[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Python 534 22 Updated Jan 4, 2025

apple / ml-fastvlm

This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025

Python 7,079 525 Updated May 5, 2025

dvlab-research / VisionZip

Official repository for VisionZip (CVPR 2025)

Python 392 16 Updated Jul 21, 2025

Gumpest / SparseVLMs

[ICML'25] Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference" and "SparseVLM+: Visual Token Sparsification with Improved Text-Vis…

Python 214 17 Updated Dec 22, 2025

Jikai0Wang / OPT-Tree

Python 29 3 Updated May 24, 2025