Fuxiang Yang

Computer vision, generative models, and embodied intelligence.

I work on scene text editing, controllable poster layout generation, and world-model reasoning for vision-language-action models. I am a research intern at Li Auto, working on embodied intelligence and VLA algorithms.

Publications

2026

CVPR 2026 · CCF A

Chain of World: World Model Thinking in Latent Motion

Fuxiang Yang, Donglin Di, Lulu Tang, Xuancheng Zhang, Lei Fan, Hao Li, Wei Chen, Tonghua Su, Baorui Ma

A VLA framework that reasons over compact latent motion chains instead of reconstructing redundant future-frame backgrounds.

2026

Pattern Recognition 2026 · JCR Q1, Chinese Academy of Sciences

Learning Priority-Aware Controllable Poster Layout Generation

Fuxiang Yang, Wendi Hou, Lei Fan, Tonghua Su, Lingxiao He, Chengzhou Li, Meng Wang, Qianlong Xie, Xingxing Wang, Donglin Di, Xun Yang

A coarse-to-fine layout generation framework using dual-path ranking, optimal transport matching, and flow-based refinement.

2026

Computer Vision and Image Understanding · CCF B

OSTE: Omni-Scene Text Editing with Latent Decoupling

Tonghua Su, Fuxiang Yang, Lei Fan, Donglin Di, Zhongjie Wang, Songze Li, Xiangqian Wu, Xiang Zhou

2025

ICME 2025 · CCF B

Global-Local Aware Scene Text Editing

Fuxiang Yang, Tonghua Su, Donglin Di, Yin Chen, Xiangqian Wu, Zhongjie Wang, Lei Fan

2023

ACM Multimedia 2023 · CCF A

Self-Supervised Cross-Language Scene Text Editing

Fuxiang Yang, Tonghua Su, Xiang Zhou, Donglin Di, Zhongjie Wang, Songze Li

2026

Pattern Recognition

Noise-Aware Cross Attention for Image Manipulation Localization

Hongshi Zhang, Tonghua Su, Zhou Liu, Fuxiang Yang, Donglin Di, Yang Song, Lei Fan

2026

ICASSP 2026

FOCA: Frequency-Oriented Cross-Domain Forgery Detection, Localization and Explanation via Multi-Modal Large Language Model

Zhou Liu, Tonghua Su, Hongshi Zhang, Fuxiang Yang, Donglin Di, Yang Song, Lei Fan

Experience

Li Auto · Foundation Model, Action Intelligence Group

Research intern working on embodied intelligence and VLA algorithms. Published CoWVLA at CVPR 2026.

Meituan · Daojia Business Group, Daojia R&D Platform, Creative Generation Group

Worked in the Creative Generation group on food delivery creative assets, studying content-aware poster layout generation and layout-guided poster image generation. The internship led to a Pattern Recognition paper and patent materials.

GTCOM · Research Intern

Interned at Global Tone Communication Technology Co., Ltd. (GTCOM), working on a National Key R&D Program subproject for style-preserving image generation in real-time text translation. The project later won a provincial silver award in the Internet+ industry track.

Education

Harbin Institute of Technology, Faculty of Computing - Software Engineering, PhD Student; Advisor: Prof. Tonghua Su (Vice Dean of the Faculty)
Harbin Institute of Technology, Faculty of Computing - Software Engineering, MS Student; Advisor: Prof. Tonghua Su (Vice Dean of the Faculty)
Harbin Institute of Technology, Faculty of Computing - Computer Science and Technology, Undergraduate Student