Skip to content
View tuidan's full-sized avatar
🎯
Focusing
🎯
Focusing

Organizations

@AIoT-MLSys-Lab

Block or report tuidan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A kernel library written in tilelang

Python 1,587 138 Updated Apr 23, 2026

Running VLA at 30Hz frame rate and 480Hz trajectory frequency

Python 571 41 Updated Feb 10, 2026

MMSpec: Benchmarking Speculative Decoding for Vision-Language Models

Python 33 2 Updated Mar 17, 2026

A Survey of Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention

298 5 Updated Dec 1, 2025

A curated list of resources related to linear attention mechanisms.

17 3 Updated Mar 16, 2025

[EMNLP 2025 Main Conference] QSpec: Speculative Decoding with Complementary Quantisation Schemes

Python 7 1 Updated Mar 9, 2026

MMDeepResearch-Bench (MMDR)

Python 29 2 Updated Apr 1, 2026

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 888 253 Updated Jun 14, 2026

A framework for efficient model inference with omni-modality models

Python 5,135 1,108 Updated Jun 14, 2026

🚀🚀 Efficient implementations of Native Sparse Attention

Python 619 15 Updated Sep 29, 2025

Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding

Python 109 8 Updated Dec 2, 2025

A PyTorch-native inference engine with cache, parallelism, quantization and cpu offload for DiTs.

Python 1,199 75 Updated Jun 12, 2026

A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM

Python 515 104 Updated Jun 13, 2026

Vortex: Programmable Sparse Attention for Agents as Algorithm Designers

Python 60 7 Updated Jun 8, 2026

A Survey on Reinforcement Learning of Vision-Language-Action Models for Robotic Manipulation

740 22 Updated May 18, 2026

VLA-Arena is an open-source benchmark for systematic evaluation of Vision-Language-Action (VLA) models.

Python 178 15 Updated Mar 14, 2026

[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filli…

Python 1,221 78 Updated Apr 8, 2026

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 12,706 1,058 Updated Apr 30, 2026

VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model

Python 2,200 199 Updated Mar 19, 2026

🚀 Efficient implementations for emerging model architectures

Python 5,217 556 Updated Jun 11, 2026

slime is an LLM post-training framework for RL Scaling.

Python 6,116 895 Updated Jun 13, 2026

TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation

Python 236 6 Updated Aug 18, 2025

Materials for learning SGLang

843 64 Updated Jan 5, 2026

Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)

Python 397 49 Updated Apr 22, 2025

Nano vLLM

Python 14,020 2,211 Updated Apr 26, 2026

PhyX: Does Your Model Have the "Wits" for Physical Reasoning?

Python 52 1 Updated Mar 16, 2026

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 1,315 181 Updated Jul 29, 2023

📰 Must-read papers and blogs on Speculative Decoding ⚡️

1,254 80 Updated Jun 2, 2026
Next