Skip to content

[ICCV 2025] DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models (official implement)

Notifications You must be signed in to change notification settings

limuloo/DreamRenderer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

37 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models 🎨

Project Page Paper Hugging Face Supplementary Material

πŸ”₯ News

  • 2025-03-17: Our paper DreamRenderer is now available on arXiv and Supplementary Material is released.
  • 2025-03-20: We release the code! πŸŽ‰
  • 2025-05-20: We have released the code for integrating DreamRenderer with SD3.

Multi-Instance Attribute Control

πŸ“ Introduction

DreamRenderer is a training-free method built upon the FLUX model that enables users to precisely control the content of each instance through bounding boxes or masks while ensuring overall visual harmony.

βœ… To-Do List

  • Arxiv Paper & Supplementary Material
  • Inference Code
  • More Demos. Coming soon. stay tuned! πŸš€
  • ComfyUI support
  • Huggingface Space support

πŸ› οΈ Installation

πŸš€ Checkpoints

Download the checkpoint for SAM2, sam2_hiera_large.pt, and place it in the pretrained_weights directory as shown below:

β”œβ”€β”€ pretrained_weights
β”‚   β”œβ”€β”€ sam2_hiera_large.pt
β”œβ”€β”€ DreamRenderer
β”‚   β”œβ”€β”€ ...
β”œβ”€β”€ scripts
β”‚   β”œβ”€β”€ ...

πŸ’» Environment Setup

# Create and activate conda environment
conda create -n dreamrenderer python=3.10 -y
conda activate dreamrenderer

# Install dependencies
pip install -r requirements.txt
pip install -e .

# Install segment-anything-2
cd segment-anything-2
pip install -e . --no-deps
cd ..

🧩 Region/Instance Controllable Rendering

You can quickly use DreamRenderer for precise rendering with the following commands:

python scripts/inference_demo0.py --use_sam_enhance

Demo 0 Output

python scripts/inference_demo1.py --use_sam_enhance

Demo 1 Output

python scripts/inference_demo2.py --num_hard_control_steps=15

Demo 2 Output

πŸ”Œ Support for ControlNet (rough implementation version)

In the original paper, we used FLUX-depth and FLUX-canny for image-conditioned generation. Now, we also provide a script that supports image-conditioned generation via ControlNet:

python scripts/inferenceCN_demo0.py --res=768

ControlNet Demo Output

πŸ”Œ Support for SD3 (rough implementation version)

To further demonstrate the generalizability of our method, we integrated DreamRenderer with another DiT-based architecture, SD3. We use ControlNet to guide generation based on depth:

python scripts/inference_demo5.py  --use_sam_enhance 

example

πŸ–ΌοΈ End-to-End Layout-to-Image Generation

DreamRenderer supports re-rendering outputs from state-of-the-art Layout-to-Image models, enhancing image quality and allowing for fine-grained control over each instance in the layout.

Here's how it works:

  1. A Layout-to-Image method first generates a coarse image based on the input layout.
  2. We extract a depth map from this image.
  3. DreamRenderer then re-renders the scene, guided by the original layout, to produce a higher-quality and more faithful result.

πŸ“¦ 1. Install Depth Map Extraction (Depth-Anything v2)

We use Depth-Anything v2 for extracting depth maps. To enable this feature, follow these steps:

Step 1: Install the Depth-Anything package

cd Depth-Anything-V2
pip install -e .
cd ..

Step 2: Download Model Weights

Download the Depth-Anything v2 model (depth_anything_v2_vitl.pth) and place it in the pretrained_weights directory:

β”œβ”€β”€ pretrained_weights
β”‚   β”œβ”€β”€ depth_anything_v2_vitl.pth
β”œβ”€β”€ DreamRenderer
β”‚   β”œβ”€β”€ ...
β”œβ”€β”€ scripts
β”‚   β”œβ”€β”€ ...

πŸš€ 2. Run End-to-End Generation

Once everything is set up, you can run the following commands to achieve end-to-end layout-to-image generation.

End-to-end layout-to-image generation with MIGC (download MIGC_SD14.ckpt and put it in pretrained_weights):

python scripts/inference_demo3.py --res=768 --use_sam_enhance --num_hard_control_steps=15

MIGC + DreamRenderer Output

End-to-end layout-to-image generation with InstanceDiffusion (download instancediffusion_sd15.pth and put it in pretrained_weights):

python scripts/inference_demo4.py --use_sam_enhance --num_hard_control_steps=10 --res=768

InstanceDiffusion + DreamRenderer Output

We will soon integrate with more SOTA layout-to-image methods. Stay tuned!

πŸ“Š Comparison with Other Models

Comparison with other models

πŸ™ Acknowledgements

We would like to thank the developers of FLUX, Segment Anything Model, Depth-Anything, diffusers, CLIP, and other open-source projects that made this work possible. We appreciate their outstanding contributions.

πŸ“š Citation

If you find this repository useful, please cite using the following BibTeX entry:

@misc{zhou2025dreamrenderer,
      title={DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models},
      author={Dewei Zhou and Mingwei Li and Zongxin Yang and Yi Yang},
      year={2025},
      eprint={2503.12885},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.12885},
}

πŸ“¬ Contact

If you have any questions or suggestions, please feel free to contact us πŸ˜†!

About

[ICCV 2025] DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models (official implement)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published