Skip to content
View lizhuo008's full-sized avatar
🔥
Focusing
🔥
Focusing
  • Nanyang Technological University
  • Singapore

Highlights

  • Pro

Block or report lizhuo008

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 577 125 Updated Dec 23, 2025

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 2,310 201 Updated Dec 23, 2025

AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。

Jupyter Notebook 5,515 763 Updated Dec 22, 2025

dParallel: Learnable Parallel Decoding for dLLMs

Python 51 1 Updated Oct 14, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 4,299 359 Updated Dec 24, 2025

Distributed MoE in a Single Kernel [NeurIPS '25]

Cuda 157 18 Updated Dec 23, 2025

Nano vLLM

Python 10,105 1,267 Updated Nov 3, 2025

[NeurIPS 2025 Oral]Infinity⭐️: Unified Spacetime AutoRegressive Modeling for Visual Generation

Python 667 24 Updated Nov 27, 2025

A Collection of Papers on Diffusion Large Language Models

40 2 Updated Dec 24, 2025

[NeurIPS'25] dKV-Cache: The Cache for Diffusion Language Models

Python 126 10 Updated May 22, 2025

Official PyTorch implementation for "Large Language Diffusion Models"

Python 3,427 230 Updated Nov 12, 2025
Python 4,466 434 Updated Sep 14, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 4,349 615 Updated Dec 25, 2025

REAP: Router-weighted Expert Activation Pruning for SMoE compression

Python 159 25 Updated Dec 9, 2025

A sparse attention kernel supporting mix sparse patterns

C++ 412 39 Updated Dec 16, 2025

Github Pages template based upon HTML and Markdown for personal, portfolio-based websites.

SCSS 16,127 4,650 Updated Dec 21, 2025

The best ChatGPT that $100 can buy.

Python 39,214 4,966 Updated Dec 23, 2025

[NeurIPS 2025] Accelerating Parallel Diffusion Model Serving with Residual Compression

Python 39 1 Updated Oct 17, 2025

Ring attention implementation with flash attention

Python 953 91 Updated Sep 10, 2025

🚀 Efficient implementations of state-of-the-art linear attention models

Python 4,117 338 Updated Dec 24, 2025
Python 111 8 Updated Sep 22, 2025

code for paper LiteVAR

Jupyter Notebook 6 Updated Nov 28, 2024

[CVPR 2025 Oral]Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

Python 1,531 84 Updated Nov 10, 2025
Python 361 25 Updated Oct 29, 2025

[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Python 534 22 Updated Jan 4, 2025

This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025

Python 7,079 525 Updated May 5, 2025

Official repository for VisionZip (CVPR 2025)

Python 392 16 Updated Jul 21, 2025

[ICML'25] Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference" and "SparseVLM+: Visual Token Sparsification with Improved Text-Vis…

Python 214 17 Updated Dec 22, 2025
Python 29 3 Updated May 24, 2025
Next