User profiles for Yuandong Pu

Yuandong Pu

SJTU,Shanghai AI Laboratory
Verified email at pjlab.org.cn
Cited by 241

Lumina-omnilv: A unified multimodal framework for general low-level vision

Y Pu, L Zhuo, K Zhu, L Xie, W Zhang, X Chen… - arXiv preprint arXiv …, 2025 - arxiv.org
We present Lunima-OmniLV (abbreviated as OmniLV), a universal multimodal multi-task
framework for low-level vision that addresses over 100 sub-tasks across four major categories: …

Learning a low-level vision generalist via visual task prompt

X Chen, Y Liu, Y Pu, W Zhang, J Zhou, Y Qiao… - Proceedings of the …, 2024 - dl.acm.org
Building a unified model for general low-level vision tasks holds significant research and
practical value. Current methods encounter several critical issues. Multi-task restoration …

Exploring scalable unified modeling for general low-level vision

X Chen, K Zhu, Y Pu, S Cao, X Li, W Zhang… - arXiv preprint arXiv …, 2025 - arxiv.org
Low-level vision involves a wide spectrum of tasks, including image restoration, enhancement,
stylization, and feature extraction, which differ significantly in both task formulation and …

PICABench: How Far Are We from Physically Realistic Image Editing?

Y Pu, L Zhuo, S Han, J Xing, K Zhu, S Cao, B Fu… - arXiv preprint arXiv …, 2025 - arxiv.org
Image editing has achieved remarkable progress recently. Modern editing models could
already follow complex instructions to manipulate the original content. However, beyond …

A comparative study of image restoration networks for general backbone network design

X Chen, Z Li, Y Pu, Y Liu, J Zhou, Y Qiao… - European Conference on …, 2024 - Springer
Despite the significant progress made by deep models in various image restoration tasks,
existing image restoration networks still face challenges in terms of task generality. An intuitive …

Lumina-dimoo: An omni diffusion large language model for multi-modal generation and understanding

…, K Wang, Y Wang, J Bai, Q Yu, D Jiang, Y Pu… - arXiv preprint arXiv …, 2025 - arxiv.org
We introduce Lumina-DiMOO, an open-source foundational model for seamless multi-modal
generation and understanding. Lumina-DiMOO sets itself apart from prior unified models by …

Artimuse: Fine-grained image aesthetics assessment with joint scoring and expert-level understanding

…, N Ma, J Li, X Li, L Shao, K Zhu, Y Zhou, Y Pu… - arXiv preprint arXiv …, 2025 - arxiv.org
The rapid advancement of educational applications, artistic creation, and AI-generated
content (AIGC) technologies has substantially increased practical requirements for …

Factuality Matters: When Image Generation and Editing Meet Structured Visuals

L Zhuo, S Han, Y Pu, B Qiu, S Paul, Y Liao… - arXiv preprint arXiv …, 2025 - arxiv.org
While modern visual generation models excel at creating aesthetically pleasing natural
images, they struggle with producing or editing structured visuals like charts, diagrams, and …

LinearSR: Unlocking Linear Attention for Stable and Efficient Image Super-Resolution

X Li, S Zhuang, S Cao, Y Yang, Y Pu, Q Qin… - arXiv preprint arXiv …, 2025 - arxiv.org
Generative models for Image Super-Resolution (SR) are increasingly powerful, yet their
reliance on self-attention's quadratic complexity (O(N^2)) creates a major computational …

Unimedvl: Unifying medical multimodal understanding and generation through observation-knowledge-analysis

…, C Zhang, J Liu, Y Chen, S Gao, L Liu, Y Pu… - arXiv preprint arXiv …, 2025 - arxiv.org
Medical diagnostic applications require models that can process multimodal medical inputs (images,
patient histories, lab results) and generate diverse outputs including both textual …