-
UW-Madison, UCLA, Yale University
- New Haven, CT
-
18:22
(UTC -04:00) - https://scholar.google.com/citations?user=ZYzw6RQAAAAJ&hl=en
- https://mlsys.run
Stars
ICCAD'23 Best Paper Award candidate: Robust GNN-based Representation Learning for HLS
[ICML2026] Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization
[ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attention
Opal (O.P.A.L. - Open simulator Platform for distributed AI and LLM workflows) is an LLM platform-level simulator written purely in Python. It can be used to explore policies, deployment configurat…
[NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank
This is the documentation repository for SGLang. It is auto-generated from https://github.com/sgl-project/sglang
My learning notes for ML SYS.
[ICML 2026] Code for Equilibrium Reasoners: learning attractor dynamics for scalable reasoning
[EMNLP 2025 Main Conference] QSpec: Speculative Decoding with Complementary Quantisation Schemes
Official Implementation for the paper "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning"
Thoughts-as-Planning: Latent World Models for Chain-of-Thoughts Optimization via Reinforcement Planning
OLIVE: Online Low-Rank Incremental Learning for Efficient Adaptive Exoskeletons
[TMLR 2025] On Memorization in Diffusion Models
A novel and efficient post-training INT6 quantization framework tailored for LLM inference.
Ranking, acceptance rate, deadline, and publication tips
Run LLMs on AMD Ryzen™ AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs.
Collection of kernel accelerators optimised for LLM execution
[CVPR 2026] Spatio-Temporal Autoregressive 4K 360° Video Generation from Perspective Video
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis
[ICML 2025 Spotlight] RAPID: Long-Context Inference with Retrieval-Augmented Speculative Decoding