Computer vision, generative models, and embodied intelligence.
I work on scene text editing, controllable poster layout generation, and world-model
reasoning for vision-language-action models. I am a research intern at Li Auto,
working on embodied intelligence and VLA algorithms.
Publications
2026
CVPR 2026 · CCF A
Chain of World: World Model Thinking in Latent Motion
Fuxiang Yang, Donglin Di, Lulu Tang, Xuancheng Zhang, Lei Fan, Hao Li, Wei Chen, Tonghua Su, Baorui Ma
A VLA framework that reasons over compact latent motion chains instead of reconstructing redundant future-frame backgrounds.
Li Auto · Foundation Model, Action Intelligence Group
Research intern working on embodied intelligence and VLA algorithms. Published CoWVLA at CVPR 2026.
Meituan · Daojia Business Group, Daojia R&D Platform, Creative Generation Group
Worked in the Creative Generation group on food delivery creative assets, studying content-aware poster layout generation and layout-guided poster image generation. The internship led to a Pattern Recognition paper and patent materials.
GTCOM · Research Intern
Interned at Global Tone Communication Technology Co., Ltd. (GTCOM), working on a National Key R&D Program subproject for style-preserving image generation in real-time text translation. The project later won a provincial silver award in the Internet+ industry track.
Education
Harbin Institute of Technology, Faculty of Computing - Software Engineering, PhD Student; Advisor: Prof. Tonghua Su (Vice Dean of the Faculty)Harbin Institute of Technology, Faculty of Computing - Software Engineering, MS Student; Advisor: Prof. Tonghua Su (Vice Dean of the Faculty)Harbin Institute of Technology, Faculty of Computing - Computer Science and Technology, Undergraduate Student