Stars
Running VLA at 30Hz frame rate and 480Hz trajectory frequency
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
HD-MoE: Hybrid and Dynamic Parallelism for Mixture-of-Expert LLMs with 3D Near-Memory Processing
Repo for SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting (ISCA25)
Code release for AdapMoE accepted by ICCAD 2024
A high-throughput and memory-efficient inference and serving engine for LLMs