Google Scholar

User profiles for Yuandong Pu

Yuandong Pu

SJTU，Shanghai AI Laboratory

Verified email at pjlab.org.cn

Cited by 241

[PDF] arxiv.org

Lumina-omnilv: A unified multimodal framework for general low-level vision

Y Pu, L Zhuo, K Zhu, L Xie, W Zhang, X Chen… - arXiv preprint arXiv …, 2025 - arxiv.org

We present Lunima-OmniLV (abbreviated as OmniLV), a universal multimodal multi-task
framework for low-level vision that addresses over 100 sub-tasks across four major categories: …

Save Cite Cited by 15 Related articles All 2 versions View as HTML

[PDF] acm.org

Learning a low-level vision generalist via visual task prompt

X Chen, Y Liu, Y Pu, W Zhang, J Zhou, Y Qiao… - Proceedings of the …, 2024 - dl.acm.org

Building a unified model for general low-level vision tasks holds significant research and
practical value. Current methods encounter several critical issues. Multi-task restoration …

Save Cite Cited by 26 Related articles All 4 versions

[PDF] arxiv.org

Exploring scalable unified modeling for general low-level vision

X Chen, K Zhu, Y Pu, S Cao, X Li, W Zhang… - arXiv preprint arXiv …, 2025 - arxiv.org

Low-level vision involves a wide spectrum of tasks, including image restoration, enhancement,
stylization, and feature extraction, which differ significantly in both task formulation and …

Save Cite Cited by 6 Related articles All 2 versions View as HTML

PICABench: How Far Are We from Physically Realistic Image Editing?

Y Pu, L Zhuo, S Han, J Xing, K Zhu, S Cao, B Fu… - arXiv preprint arXiv …, 2025 - arxiv.org

Image editing has achieved remarkable progress recently. Modern editing models could
already follow complex instructions to manipulate the original content. However, beyond …

Save Cite Cited by 3 Related articles All 2 versions Cached

[PDF] arxiv.org

A comparative study of image restoration networks for general backbone network design

X Chen, Z Li, Y Pu, Y Liu, J Zhou, Y Qiao… - European Conference on …, 2024 - Springer

Despite the significant progress made by deep models in various image restoration tasks,
existing image restoration networks still face challenges in terms of task generality. An intuitive …

Save Cite Cited by 92 Related articles All 5 versions

[PDF] arxiv.org

Lumina-dimoo: An omni diffusion large language model for multi-modal generation and understanding

…, K Wang, Y Wang, J Bai, Q Yu, D Jiang, Y Pu… - arXiv preprint arXiv …, 2025 - arxiv.org

We introduce Lumina-DiMOO, an open-source foundational model for seamless multi-modal
generation and understanding. Lumina-DiMOO sets itself apart from prior unified models by …

Save Cite Cited by 61 Related articles All 2 versions View as HTML

[PDF] arxiv.org

Artimuse: Fine-grained image aesthetics assessment with joint scoring and expert-level understanding

…, N Ma, J Li, X Li, L Shao, K Zhu, Y Zhou, Y Pu… - arXiv preprint arXiv …, 2025 - arxiv.org

The rapid advancement of educational applications, artistic creation, and AI-generated
content (AIGC) technologies has substantially increased practical requirements for …

Save Cite Cited by 17 Related articles All 3 versions View as HTML

[PDF] arxiv.org

Factuality Matters: When Image Generation and Editing Meet Structured Visuals

L Zhuo, S Han, Y Pu, B Qiu, S Paul, Y Liao… - arXiv preprint arXiv …, 2025 - arxiv.org

While modern visual generation models excel at creating aesthetically pleasing natural
images, they struggle with producing or editing structured visuals like charts, diagrams, and …

Save Cite Cited by 6 Related articles All 3 versions View as HTML

[PDF] arxiv.org

LinearSR: Unlocking Linear Attention for Stable and Efficient Image Super-Resolution

X Li, S Zhuang, S Cao, Y Yang, Y Pu, Q Qin… - arXiv preprint arXiv …, 2025 - arxiv.org

Generative models for Image Super-Resolution (SR) are increasingly powerful, yet their
reliance on self-attention's quadratic complexity (O(N^2)) creates a major computational …

Save Cite Cited by 1 Related articles All 2 versions View as HTML

Unimedvl: Unifying medical multimodal understanding and generation through observation-knowledge-analysis

…, C Zhang, J Liu, Y Chen, S Gao, L Liu, Y Pu… - arXiv preprint arXiv …, 2025 - arxiv.org

Medical diagnostic applications require models that can process multimodal medical inputs (images,
patient histories, lab results) and generate diverse outputs including both textual …

Save Cite Cited by 7 Related articles All 2 versions Cached

Create alert

Cite

Advanced search

Saved to My library

User profiles for Yuandong Pu

Yuandong Pu

Lumina-omnilv: A unified multimodal framework for general low-level vision

Learning a low-level vision generalist via visual task prompt

Exploring scalable unified modeling for general low-level vision

PICABench: How Far Are We from Physically Realistic Image Editing?

A comparative study of image restoration networks for general backbone network design

Lumina-dimoo: An omni diffusion large language model for multi-modal generation and understanding

Artimuse: Fine-grained image aesthetics assessment with joint scoring and expert-level understanding

Factuality Matters: When Image Generation and Editing Meet Structured Visuals

LinearSR: Unlocking Linear Attention for Stable and Efficient Image Super-Resolution

Unimedvl: Unifying medical multimodal understanding and generation through observation-knowledge-analysis