-
Ohio State University
- Columbus, OH, USA
-
20:37
(UTC +08:00) - https://scholar.google.com/citations?user=Q7yOQTMAAAAJ&hl=zh-CN
Stars
A kernel library written in tilelang
Running VLA at 30Hz frame rate and 480Hz trajectory frequency
MMSpec: Benchmarking Speculative Decoding for Vision-Language Models
A Survey of Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention
A curated list of resources related to linear attention mechanisms.
[EMNLP 2025 Main Conference] QSpec: Speculative Decoding with Complementary Quantisation Schemes
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
A framework for efficient model inference with omni-modality models
🚀🚀 Efficient implementations of Native Sparse Attention
Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding
A PyTorch-native inference engine with cache, parallelism, quantization and cpu offload for DiTs.
A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
Vortex: Programmable Sparse Attention for Agents as Algorithm Designers
A Survey on Reinforcement Learning of Vision-Language-Action Models for Robotic Manipulation
VLA-Arena is an open-source benchmark for systematic evaluation of Vision-Language-Action (VLA) models.
[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filli…
FlashMLA: Efficient Multi-head Latent Attention Kernels
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
🚀 Efficient implementations for emerging model architectures
slime is an LLM post-training framework for RL Scaling.
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
PhyX: Does Your Model Have the "Wits" for Physical Reasoning?
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…
📰 Must-read papers and blogs on Speculative Decoding ⚡️