- 2025-03-17: Our paper DreamRenderer is now available on arXiv and Supplementary Material is released.
- 2025-03-20: We release the code! π
- 2025-05-20: We have released the code for integrating DreamRenderer with SD3.
DreamRenderer is a training-free method built upon the FLUX model that enables users to precisely control the content of each instance through bounding boxes or masks while ensuring overall visual harmony.
- Arxiv Paper & Supplementary Material
- Inference Code
- More Demos. Coming soon. stay tuned! π
- ComfyUI support
- Huggingface Space support
Download the checkpoint for SAM2, sam2_hiera_large.pt, and place it in the pretrained_weights directory as shown below:
βββ pretrained_weights
β βββ sam2_hiera_large.pt
βββ DreamRenderer
β βββ ...
βββ scripts
β βββ ...
# Create and activate conda environment
conda create -n dreamrenderer python=3.10 -y
conda activate dreamrenderer
# Install dependencies
pip install -r requirements.txt
pip install -e .
# Install segment-anything-2
cd segment-anything-2
pip install -e . --no-deps
cd ..You can quickly use DreamRenderer for precise rendering with the following commands:
python scripts/inference_demo0.py --use_sam_enhancepython scripts/inference_demo1.py --use_sam_enhancepython scripts/inference_demo2.py --num_hard_control_steps=15In the original paper, we used FLUX-depth and FLUX-canny for image-conditioned generation. Now, we also provide a script that supports image-conditioned generation via ControlNet:
python scripts/inferenceCN_demo0.py --res=768To further demonstrate the generalizability of our method, we integrated DreamRenderer with another DiT-based architecture, SD3. We use ControlNet to guide generation based on depth:
python scripts/inference_demo5.py --use_sam_enhance
DreamRenderer supports re-rendering outputs from state-of-the-art Layout-to-Image models, enhancing image quality and allowing for fine-grained control over each instance in the layout.
Here's how it works:
- A Layout-to-Image method first generates a coarse image based on the input layout.
- We extract a depth map from this image.
- DreamRenderer then re-renders the scene, guided by the original layout, to produce a higher-quality and more faithful result.
We use Depth-Anything v2 for extracting depth maps. To enable this feature, follow these steps:
cd Depth-Anything-V2
pip install -e .
cd ..Download the Depth-Anything v2 model (depth_anything_v2_vitl.pth) and place it in the pretrained_weights directory:
βββ pretrained_weights
β βββ depth_anything_v2_vitl.pth
βββ DreamRenderer
β βββ ...
βββ scripts
β βββ ...
Once everything is set up, you can run the following commands to achieve end-to-end layout-to-image generation.
End-to-end layout-to-image generation with MIGC (download MIGC_SD14.ckpt and put it in pretrained_weights):
python scripts/inference_demo3.py --res=768 --use_sam_enhance --num_hard_control_steps=15End-to-end layout-to-image generation with InstanceDiffusion (download instancediffusion_sd15.pth and put it in pretrained_weights):
python scripts/inference_demo4.py --use_sam_enhance --num_hard_control_steps=10 --res=768We will soon integrate with more SOTA layout-to-image methods. Stay tuned!
We would like to thank the developers of FLUX, Segment Anything Model, Depth-Anything, diffusers, CLIP, and other open-source projects that made this work possible. We appreciate their outstanding contributions.
If you find this repository useful, please cite using the following BibTeX entry:
@misc{zhou2025dreamrenderer,
title={DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models},
author={Dewei Zhou and Mingwei Li and Zongxin Yang and Yi Yang},
year={2025},
eprint={2503.12885},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.12885},
}If you have any questions or suggestions, please feel free to contact us π!