User profiles for Yuandong Pu
Yuandong PuSJTU,Shanghai AI Laboratory Verified email at pjlab.org.cn Cited by 241 |
Lumina-omnilv: A unified multimodal framework for general low-level vision
We present Lunima-OmniLV (abbreviated as OmniLV), a universal multimodal multi-task
framework for low-level vision that addresses over 100 sub-tasks across four major categories: …
framework for low-level vision that addresses over 100 sub-tasks across four major categories: …
Learning a low-level vision generalist via visual task prompt
Building a unified model for general low-level vision tasks holds significant research and
practical value. Current methods encounter several critical issues. Multi-task restoration …
practical value. Current methods encounter several critical issues. Multi-task restoration …
Exploring scalable unified modeling for general low-level vision
Low-level vision involves a wide spectrum of tasks, including image restoration, enhancement,
stylization, and feature extraction, which differ significantly in both task formulation and …
stylization, and feature extraction, which differ significantly in both task formulation and …
PICABench: How Far Are We from Physically Realistic Image Editing?
Image editing has achieved remarkable progress recently. Modern editing models could
already follow complex instructions to manipulate the original content. However, beyond …
already follow complex instructions to manipulate the original content. However, beyond …
A comparative study of image restoration networks for general backbone network design
Despite the significant progress made by deep models in various image restoration tasks,
existing image restoration networks still face challenges in terms of task generality. An intuitive …
existing image restoration networks still face challenges in terms of task generality. An intuitive …
Lumina-dimoo: An omni diffusion large language model for multi-modal generation and understanding
We introduce Lumina-DiMOO, an open-source foundational model for seamless multi-modal
generation and understanding. Lumina-DiMOO sets itself apart from prior unified models by …
generation and understanding. Lumina-DiMOO sets itself apart from prior unified models by …
Artimuse: Fine-grained image aesthetics assessment with joint scoring and expert-level understanding
The rapid advancement of educational applications, artistic creation, and AI-generated
content (AIGC) technologies has substantially increased practical requirements for …
content (AIGC) technologies has substantially increased practical requirements for …
Factuality Matters: When Image Generation and Editing Meet Structured Visuals
While modern visual generation models excel at creating aesthetically pleasing natural
images, they struggle with producing or editing structured visuals like charts, diagrams, and …
images, they struggle with producing or editing structured visuals like charts, diagrams, and …
LinearSR: Unlocking Linear Attention for Stable and Efficient Image Super-Resolution
Generative models for Image Super-Resolution (SR) are increasingly powerful, yet their
reliance on self-attention's quadratic complexity (O(N^2)) creates a major computational …
reliance on self-attention's quadratic complexity (O(N^2)) creates a major computational …
Unimedvl: Unifying medical multimodal understanding and generation through observation-knowledge-analysis
Medical diagnostic applications require models that can process multimodal medical inputs (images,
patient histories, lab results) and generate diverse outputs including both textual …
patient histories, lab results) and generate diverse outputs including both textual …