About
I am a 4th year Ph.D. candidate at Institute for AI Industry Research (AIR), Tsinghua University, advised by Prof. Yunxin Liu(刘云新) and working closely with Prof. Ting Cao(曹婷). Prior to this, I received my B.E. degree from Department of Electronic Engineering, Tsinghua University in 2022.
My Ph.D. research centers on system optimization for edge AI deployment, toward the goal of running increasingly capable models on resource-constrained edge. Representative works include:
- FlexNN (MobiCom 2024): an efficient and adaptive memory management framework for memory-constrained on-device DNN inference.
- Vec-LUT (MobiSys 2026): an efficient mpGeMM kernel based on vector lookup table (LUT) for parallel ultra-low-bit LLM inference.
- OxyGen (preprint): a unified KV cache management framework for multi-task MoT-based VLA inference.
An overview of my Ph.D. work is published at ACM MobiSys 2026 Rising Stars Forum.
I am currently working on model-system co-design for embodied AI, especially VLA training and deployment.
Feel free to email me: lixiangy22@mails.tsinghua.edu.cn.
News
- 2026/06 ActProbe: Action-Space Probe for Early Failure Detection of Generative Robot Policies released. PaperCodePage
- 2026/05 Vec-LUT selected as featured paper for the On-Device AI session of ACM MobiSys 2026. PaperCodeModel
- 2026/05 OxyGen updated: released ArXiv v2, and added PyTorch support (previously JAX-only) for on-board deployment (e.g., on Jetson AGX Thor). PaperCode
- 2026/05 EmbodiSkill: Skill-Aware Reflection for Self-Evolving Embodied Agents released. Paper
- 2026/04 Building Efficient Inference Systems for Resource-Constrained Edge AI Deployment (short paper) accepted to the Rising Stars Forum of ACM MobiSys 2026. Paper
- 2026/03 OxyGen: Unified KV Cache Management for Vision-Language-Action Models under Multi-Task Parallelism released. PaperCode
- 2026/03 Vec-LUT: Vector Table Lookup for Parallel Ultra-Low-Bit LLM Inference on Edge Devices accepted to ACM MobiSys 2026. PaperCodeModel
- 2025/08 An Empirical Study of LLM Reasoning Ability Under Strict Output Length Constraint accepted to EMNLP 2025. Congrats to Yi Sun. PaperPage
- 2025/07 Squeezer: Efficient Multi-DNN Inference for Edge Video Analytics via Cross-Model Scheduling accepted to IEEE TMC. Congrats to Xiang Wang. Paper
- 2024/01 Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security (survey & position paper) released. PaperPage机器之心
- 2023/11 FlexNN: Efficient and Adaptive DNN Inference on Memory-Constrained Edge Devices accepted to ACM MobiCom 2024. PaperCodeSlides
Experiences
Awards
- ACM MobiSys 2026 Student Travel Grant, 2026/05
- 清华之友-智能产业研究院清智奖学金, 2025/12
- 清华之友-济宁英才奖学金, 2024/11
Internship
- ByteDance, Big Data (Serverless/FaaS) Infrastructure Development, 2021/06–2021/09
Teaching Assistant
- Tsinghua University, Computer Program Design (undergraduate), 2025 Spring & Summer
- Tsinghua University, Basic Music Theory and Vocal Practice, 2023 Fall
Others
- EESAST (清华大学电子工程系学生科协), 2019/07–2022/06
Skills
Programming Languages
- C/C++, Python: primary languages for research/projects.
- Shell, HTML/CSS/JavaScript: frequently used with AI.
- C#, Go: familiar from earlier projects.
- Java/Kotlin, MATLAB, Verilog, MIPS assembly.
Frameworks
- vLLM, llama.cpp, NCNN, openpi: most familiar; customized or contributed to.
- PyTorch, Transformers: frequently used for research.
- SGLang, JAX, TensorRT, TF Lite, LlamaIndex: used in past work.
- Electron, React, Unity 3D.