Welcome to Yifu's homepage.

Leave your curiosity for the world, and let time deliver the answers.

Bio: PhD Candidate at Beihang University (School of Computer Science and Engineering) & Nanyang Technological University (College of Computing and Data Science) (joint programme). Supervised by Prof. Xianglong Liu and Prof. Dacheng Tao. Research focus: Efficient foundation-model inference and deployment. PhD Expected Graduation: Dec 2026.

Google Scholar GitHub LinkedIn

📌 Updates

📄 One paper "Layer Sensitivity Matters: Mixed-Precision Post-Training Quantization for SAM2 Video Segmentation" accepted to GLOW @ IJCAI 2026. Congratulations to Wenyu Zhou! Jun 2026
📢 We warmly welcome potential speakers to join us at this year's workshop, The 3rd Efficient Computing under Limited Resources: Modern AI Models and Systems. Past workshop homepages: 2nd at ICCV 2025, 1st at ACM MM 2024. Jun 2026
📢 My colleagues are holding the 4th International Workshop on Generalizing from Limited Resources in the Open World at IJCAI 2026 in Bremen, Germany. Welcome to follow and attend. Jun 2026
🏆 Successfully passed the completion review for the Outstanding Doctoral Academic Fund at Beihang University. Jun 2026
📄 Two papers accepted by ICML 2026: "Attribution-Guided and Coverage-Maximized Pruning for Structural MoE Compression" and "SPA-Cache: Singular Proxies for Adaptive Caching in Diffusion Language Models". Congratulations to co-authors! Apr 2026
📄 One paper accepted by ICLR 2026: "QVGen: Pushing the Limit of Quantized Video Generative Models". Congratulations to Yushi Huang! Jan 2026
📄 One paper accepted by ACL 2025: "Dynamic Parallel Tree Search for Efficient LLM Reasoning". May 2025

📄 Recent Papers

ICML 2026 Spotlight

PDF Code BibTex

Attribution-Guided and Coverage-Maximized Pruning for Structural MoE Compression

Yifu Ding, Jiacheng Wang, Ge Yang, Yongcheng Jing, Jinyang Guo, Xianglong Liu, Dacheng Tao

This work structurally compresses MoE models by pruning channels rather than whole experts, using attribution-guided coverage maximization to better preserve important expert information. On DeepSeek and Qwen MoEs, it maintains accuracy under 50%/25% pruning with 4-bit quantization and reduces Qwen3-30B-A3B memory by 5.27×.

ICML 2026

PDF Code BibTex

SPA-Cache: Singular Proxies for Adaptive Caching in Diffusion Language Models

Wenhao Sun, Rong-Cheng Tu, Yifu Ding, Zhao Jin, Jingyi Liao, Yongcheng Jing, Dacheng Tao

SPA-Cache accelerates diffusion language model decoding with a low-dimensional singular proxy for identifying update-critical tokens and an adaptive layer-wise update budget. It delivers up to 8× throughput improvement over vanilla decoding and 2-4× speedup over existing caching baselines.

CVPR 2026

PDF Code BibTex

MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping

Yushi Huang, Zining Wang, Zhenyu Yuan, Yifu Ding, Ruihao Gong, Jinyang Guo, Xianglong Liu, Jun Zhang

MoDES accelerates multimodal LLM inference with training-free expert skipping driven by modality heterogeneity.

EDGE @ CVPR 2026

PDF Code BibTex

Diagonal-Tiled Mixed-Precision Attention for Efficient Low-Bit MXFP Inference

Yifu Ding, Xuan Zhang

A Triton-based MXFP mixed-precision attention kernel for efficient and accurate low-bit attention inference on NVIDIA B200.

ICLR 2026

PDF Code BibTex

QVGen: Pushing the Limit of Quantized Video Generative Models

Yushi Huang, Ruihao Gong, Jing Liu, Yifu Ding, Chengtao Lv, Haotong Qin, Jun Zhang

QVGen enables extremely low-bit quantization-aware training for video diffusion models. It stabilizes QAT by reducing gradient norms with auxiliary modules, then removes inference overhead via rank-decay. Shows near full-precision quality at 4-bit.

Neural Networks 2025

PDF BibTex

A survey of low-bit large language models: Basics, systems, and algorithms

Ruihao Gong, Yifu Ding, Zining Wang, Chengtao Lv, Xingyu Zheng, Jinyang Du, Haotong Qin, Jinyang Guo, Michele Magno, Xianglong Liu

This survey reviews low-bit quantization for large language models, covering core principles, data formats, system support, and algorithmic methods. It highlights how low-bit techniques reduce memory and computation costs while preserving performance.

📦 Selected Repositories

👥 Workshops

Program Chair at the 3rd ECLR workshop: Efficient Computing under Limited Resources: Modern AI Models and Systems. (⭐️Proposal prepared for submission. Speaker invitations are still open.⭐️). TBD
My lab colleagues held the 6th Workshop of Adversarial Machine Learning on Computer Vision: Safety of Vision-Language Agents at CVPR 2026. Welcome to follow! Jun 2026
Program Chair at the 2nd Workshop on Efficient Computing under Limited Resources: Visual Computing at ICCV 2025. Responsible for full process coordination, including workshop promotion, reviewer assignment, decision organization, and final camera-ready metadata submission. Oct 2025
My lab colleagues held the 3rd International Workshop on Generalizing from Limited Resources in the Open World at IJCAI 2025. Welcome to follow! Aug 2025
Local Arrangement Chair at the 2nd International Workshop on Generalizing from Limited Resources in the Open World at IJCAI 2024. Responsible for on-site logistics and coordination to ensure smooth conference operations. Aug 2024
Publicity Chair at the 1st International Workshop on Efficient Multimedia Computing under Limited Resources at ACM MM 2024. Oct 2024

📖 Education

Joint-Training Doctoral Student, Nanyang Technological University (College of Computing and Data Science). Singapore. Supervised by Prof. Dacheng Tao.Nov 2024 - Nov 2026
Ph.D. Candidate, Computer Science, Beihang University (School of Computer Science and Engineering). Beijing, China. Supervised by Prof. Xianglong Liu and Prof. Dacheng Tao.Sep 2021 - Dec 2026
B.Eng., Computer Science, Beihang University (School of Computer Science and Engineering). Beijing, China. GPA: 3.80/4.0; rank: 25/257.Sep 2017 - Jun 2021

🏆 Awards & Funding

Outstanding Doctoral Academic Fund, Beihang University. CNY 40,000. 2025
State Scholarship Fund, China Scholarship Council. SGD 26,400 (approx. CNY 140,000). 2024
National Scholarship for Graduate Students, Ministry of Education of the P.R. China. CNY 50,000. 2024
Outstanding Academic Achievement Award, Beihang University. 2023
Doctoral Academic Scholarship, First Prize, Beihang University. 2022

👋 A Few Words

As graduation approaches, life has become busier than ever. Several projects that I co-lead or participate in are still ongoing, as listed on my Research page. Some are coming soon, while others have been continuously explored for more than two years and are still in the darkness before dawn.

I feel fortunate to always have frontier research topics to work on and excellent teammates to work with. There has never been a dull moment in my Ph.D. life. As this chapter gradually comes to a close, I am excited and looking forward to embracing my upcoming journey in industry.

Beyond research, I also enjoy travel and photography. If you are interested, you can find my portfolio here.

🤝 Quick links to my collaborators

Jinyang Guo (郭晋阳) — One of my PhD advisors, always supportive and generous with insightful advice.
Xingyu Zheng (郑星宇) — A hardworking and focused labmate with deep insight into diffusion models.
Zining Wang — A humorous and resilient labmate, with research experience at ByteDance Seed.
Wenhao Sun — A helpful and active researcher in video diffusion acceleration, with rich industry experience.
Yushi Huang — A prolific open-source AI researcher with exceptional insight into visual understanding for large models.
Zihao Jing — The most talented and meticulous AI researcher I have ever met.

Welcome to Yifu's homepage.

📌 Updates

📄 Recent Papers

Attribution-Guided and Coverage-Maximized Pruning for Structural MoE Compression

SPA-Cache: Singular Proxies for Adaptive Caching in Diffusion Language Models

MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping

Diagonal-Tiled Mixed-Precision Attention for Efficient Low-Bit MXFP Inference

QVGen: Pushing the Limit of Quantized Video Generative Models

A survey of low-bit large language models: Basics, systems, and algorithms

📦 Selected Repositories

👥 Workshops

📖 Education

🏆 Awards & Funding

👋 A Few Words

🤝 Quick links to my collaborators

🧭 Discover by Channels

Research

Development