- One paper "Layer Sensitivity Matters: Mixed-Precision Post-Training Quantization for SAM2 Video Segmentation" accepted to GLOW @ IJCAI 2026. Congratulations to Wenyu Zhou!
- We warmly welcome potential speakers to join us at this year's workshop, The 3rd Efficient Computing under Limited Resources: Modern AI Models and Systems. Past workshop homepages: 2nd at ICCV 2025, 1st at ACM MM 2024.
- My colleagues are holding the 4th International Workshop on Generalizing from Limited Resources in the Open World at IJCAI 2026 in Bremen, Germany. Welcome to follow and attend.
- Successfully passed the completion review for the Outstanding Doctoral Academic Fund at Beihang University.
- Two papers accepted by ICML 2026: "Attribution-Guided and Coverage-Maximized Pruning for Structural MoE Compression" and "SPA-Cache: Singular Proxies for Adaptive Caching in Diffusion Language Models". Congratulations to co-authors!
- One paper accepted by ICLR 2026: "QVGen: Pushing the Limit of Quantized Video Generative Models". Congratulations to Yushi Huang!
- One paper accepted by ACL 2025: "Dynamic Parallel Tree Search for Efficient LLM Reasoning".
Welcome to Yifu's homepage.
Leave your curiosity for the world, and let time deliver the answers.
Bio: PhD Candidate at Beihang University (School of Computer Science and Engineering) & Nanyang Technological University (College of Computing and Data Science) (joint programme). Supervised by Prof. Xianglong Liu and Prof. Dacheng Tao. Research focus: Efficient foundation-model inference and deployment. PhD Expected Graduation: Dec 2026.
๐ Updates
๐ Recent Papers
Attribution-Guided and Coverage-Maximized Pruning for Structural MoE Compression
This work structurally compresses MoE models by pruning channels rather than whole experts, using attribution-guided coverage maximization to better preserve important expert information. On DeepSeek and Qwen MoEs, it maintains accuracy under 50%/25% pruning with 4-bit quantization and reduces Qwen3-30B-A3B memory by 5.27ร.
SPA-Cache: Singular Proxies for Adaptive Caching in Diffusion Language Models
SPA-Cache accelerates diffusion language model decoding with a low-dimensional singular proxy for identifying update-critical tokens and an adaptive layer-wise update budget. It delivers up to 8ร throughput improvement over vanilla decoding and 2-4ร speedup over existing caching baselines.
MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping
MoDES accelerates multimodal LLM inference with training-free expert skipping driven by modality heterogeneity.
Diagonal-Tiled Mixed-Precision Attention for Efficient Low-Bit MXFP Inference
A Triton-based MXFP mixed-precision attention kernel for efficient and accurate low-bit attention inference on NVIDIA B200.
QVGen: Pushing the Limit of Quantized Video Generative Models
QVGen enables extremely low-bit quantization-aware training for video diffusion models. It stabilizes QAT by reducing gradient norms with auxiliary modules, then removes inference overhead via rank-decay. Shows near full-precision quality at 4-bit.
A survey of low-bit large language models: Basics, systems, and algorithms
This survey reviews low-bit quantization for large language models, covering core principles, data formats, system support, and algorithmic methods. It highlights how low-bit techniques reduce memory and computation costs while preserving performance.
๐ฆ Selected Repositories
๐ฅ Workshops
- Program Chair at the 3rd ECLR workshop: Efficient Computing under Limited Resources: Modern AI Models and Systems. (โญ๏ธProposal prepared for submission. Speaker invitations are still open.โญ๏ธ).
- My lab colleagues held the 6th Workshop of Adversarial Machine Learning on Computer Vision: Safety of Vision-Language Agents at CVPR 2026. Welcome to follow!
- Program Chair at the 2nd Workshop on Efficient Computing under Limited Resources: Visual Computing at ICCV 2025. Responsible for full process coordination, including workshop promotion, reviewer assignment, decision organization, and final camera-ready metadata submission.
- My lab colleagues held the 3rd International Workshop on Generalizing from Limited Resources in the Open World at IJCAI 2025. Welcome to follow!
- Local Arrangement Chair at the 2nd International Workshop on Generalizing from Limited Resources in the Open World at IJCAI 2024. Responsible for on-site logistics and coordination to ensure smooth conference operations.
- Publicity Chair at the 1st International Workshop on Efficient Multimedia Computing under Limited Resources at ACM MM 2024.
๐ Education
- Joint-Training Doctoral Student, Nanyang Technological University (College of Computing and Data Science). Singapore. Supervised by Prof. Dacheng Tao.
- Ph.D. Candidate, Computer Science, Beihang University (School of Computer Science and Engineering). Beijing, China. Supervised by Prof. Xianglong Liu and Prof. Dacheng Tao.
- B.Eng., Computer Science, Beihang University (School of Computer Science and Engineering). Beijing, China. GPA: 3.80/4.0; rank: 25/257.
๐ Awards & Funding
- Outstanding Doctoral Academic Fund, Beihang University. CNY 40,000. 2025
- State Scholarship Fund, China Scholarship Council. SGD 26,400 (approx. CNY 140,000). 2024
- National Scholarship for Graduate Students, Ministry of Education of the P.R. China. CNY 50,000. 2024
- Outstanding Academic Achievement Award, Beihang University. 2023
- Doctoral Academic Scholarship, First Prize, Beihang University. 2022
๐ A Few Words
As graduation approaches, life has become busier than ever. Several projects that I co-lead or participate in are still ongoing, as listed on my Research page. Some are coming soon, while others have been continuously explored for more than two years and are still in the darkness before dawn.
I feel fortunate to always have frontier research topics to work on and excellent teammates to work with. There has never been a dull moment in my Ph.D. life. As this chapter gradually comes to a close, I am excited and looking forward to embracing my upcoming journey in industry.
Beyond research, I also enjoy travel and photography. If you are interested, you can find my portfolio here.
๐ค Quick links to my collaborators
- Jinyang Guo (้ญๆ้ณ) โ One of my PhD advisors, always supportive and generous with insightful advice.
- Xingyu Zheng (้ๆๅฎ) โ A hardworking and focused labmate with deep insight into diffusion models.
- Zining Wang โ A humorous and resilient labmate, with research experience at ByteDance Seed.
- Wenhao Sun โ A helpful and active researcher in video diffusion acceleration, with rich industry experience.
- Yushi Huang โ A prolific open-source AI researcher with exceptional insight into visual understanding for large models.
- Zihao Jing โ The most talented and meticulous AI researcher I have ever met.