ColorCtrl: Training-Free Text-Guided Color Editing with Multi-Modal Diffusion Transformer

Zixin Yin^1,2, Xili Dai³, Ling-Hao Chen^2,4, Deyu Zhou^3,6, Jianan Wang⁵ Duomin Wang⁶, Gang Yu⁶, Lionel Ni^1,3, Lei Zhang², Heung-Yeung Shum¹

¹HKUST, ²IDEA Research, ³HKUST(GZ), ⁴Tsinghua University, ⁵Astribot, ⁶ StepFun

✨ICLR 2026✨

🎯 Demo

Source Video	Edited Video

🔧 Setup

Requirements

pip install -r requirements.txt

Model Preparation

Download the required diffusion models:

FLUX.1-dev: /path/to/FLUX.1-dev
Stable Diffusion 3: /path/to/stable-diffusion-3-medium-diffusers
CogVideoX-2b: /path/to/CogVideoX-2b

Update the model paths in the scripts accordingly.

🚀 Quick Start

Using Scripts

We provide two FLUX-based demonstration scripts in the script/ directory:

1. Consistent Editing (Change Color/Material)

bash script/flux_consist_edit.sh

2. Inconsistent Editing (Change Style/Object)

bash script/flux_inconsist_edit.sh

Manual Usage

FLUX

python run_synthesis_flux.py \
    --src_prompt "a portrait of a woman in a red dress in a forest, best quality" \
    --tgt_prompt "a portrait of a woman in a yellow dress in a forest, best quality" \
    --edit_object "dress" \
    --out_dir "output" \
    --alpha 1.0 \
    --model_path "/path/to/FLUX.1-dev"

python run_synthesis_flux.py \
    --src_prompt "a woman is standing in a town facing front, realistic style" \
    --tgt_prompt "a woman is standing in a town facing front, cartoon style" \
    --out_dir "output" \
    --alpha 0.1 \
    --no_mask \
    --model_path "/path/to/FLUX.1-dev"

Stable Diffusion 3

python run_synthesis_sd3.py \
    --src_prompt "a portrait of a woman in a red dress in a forest, best quality" \
    --tgt_prompt "a portrait of a woman in a yellow dress in a forest, best quality" \
    --edit_object "dress" \
    --out_dir "output" \
    --alpha 1.0 \
    --model_path "/path/to/stable-diffusion-3-medium-diffusers"

python run_synthesis_sd3.py \
    --src_prompt "a portrait of a woman in a red dress, realistic style, best quality" \
    --tgt_prompt "a portrait of a woman in a yellow dress, cartoon style, best quality" \
    --edit_object "dress" \
    --out_dir "output" \
    --alpha 0.3 \
    --model_path "/path/to/stable-diffusion-3-medium-diffusers"

CogVideo

python run_synthesis_cog.py \
    --src_prompt "a portrait of a woman in a red dress in a forest, best quality" \
    --tgt_prompt "a portrait of a woman in a yellow dress in a forest, best quality" \
    --edit_object "dress" \
    --out_dir "output" \
    --alpha 1.0 \
    --model_path "/path/to/CogVideoX-2b"

Real Image Editing

Real-image editing follows the same FLUX-based ColorCtrl attention-map swapping pipeline, but starts from an input image and uses masked value preservation throughout inversion and denoising.

python run_real_flux.py \
    --src_prompt "Yoshua Bengio is wearing a red shirt" \
    --tgt_prompt "Yoshua Bengio is wearing a black shirt" \
    --edit_object "shirt" \
    --source_image_path "assets/bengio.png" \
    --out_dir "output" \
    --alpha 0.35 \
    --model_path "/path/to/FLUX.1-dev"

🎭 Masking Strategies

Available Modes

1. No Mask (`--no_mask`)

What it does: Disables mask-guided value preservation.

Result: Colors outside the edited region can drift more easily.

python run_synthesis_flux.py --no_mask --alpha 0.3 ...

2. ColorCtrl Mask (Default)

What it does: Uses the ColorCtrl attention-map swap with mask-guided value preservation.

Technical Details:

Mask Calculation: Computes masks from averaged attention maps
Image Generation: Swaps only the vision-vision attention block while preserving masked values

Result: ✅ Better background preservation with a cleaner public implementation

⚙️ Parameters

Common Parameters

Parameter	Type	Default	Description
`--src_prompt`	str	Required	Source image prompt: Text description used to generate the source image. This defines the initial state before editing.
`--tgt_prompt`	str	Required	Target image prompt: Text description for the edited result.
`--edit_object`	str	Required	Edit object word: Single word or phrase that appears in `src_prompt` and specifies what object to edit. Used for mask generation.
`--out_dir`	str	`"output"`	Output directory: Directory where generated images and masks will be saved.
`--alpha`	float	`1.0`	Consistency strength: Controls the strength of cross-attention injection (consistency_strength in paper). Range: 0.0-1.0.
`--model_path`	str	Required	Model path: Local path to the diffusion model directory.
`--no_mask`	flag	`False`	Disable masking: When set, no mask is generated and no content fusion is applied. Use this to observe uncontrolled changes.

Model-Specific Parameters

Real Image Editing (`run_real_flux.py`)

Parameter	Type	Default	Description
`--src_prompt`	str	`"Yoshua Bengio is wearing a red shirt"`	Source prompt paired with the input image during inversion.
`--tgt_prompt`	str	`"Yoshua Bengio is wearing a black shirt"`	Edited prompt used for reconstruction.
`--edit_object`	str	`"shirt"`	Word used to derive the edit mask.
`--source_image_path`	str	`"assets/bengio.png"`	Input real image path.
`--out_dir`	str	`"output"`	Directory for reconstructed source image, edited result, mask, and latent.
`--alpha`	float	`0.35`	Real-image consistency strength used by the preserve variant.
`--model_path`	str	`"/path/to/FLUX.1-dev"`	Local FLUX model path.

📊 Evaluation

Benchmark Generation

To generate results for PIE-Bench or ColorCtrl-Bench:

ColorCtrl-Bench annotation mapping is bundled in this repo at evaluation/colorctrl_bench_mapping.json.
When you use --benchmark colorctrl-bench, both run_metric.py and evaluate.py load that local JSON automatically.

python run_metric.py \
    --model_path "/path/to/FLUX.1-dev" \
    --data_path "/path/to/benchmark-root" \
    --benchmark piebench

Switch to ColorCtrl-Bench with:

python run_metric.py \
    --model_path "/path/to/FLUX.1-dev" \
    --data_path "/path/to/colorctrl-bench-root" \
    --benchmark colorctrl-bench

Metric Calculation

To compute evaluation metrics for either benchmark:

python evaluate.py --benchmark piebench

📁 Project Structure

ColorCtrl/
├── run_synthesis_flux.py     # FLUX synthesis editing
├── run_synthesis_sd3.py      # SD3 synthesis editing
├── run_synthesis_cog.py      # CogVideo editing
├── run_real_flux.py          # Real image editing
├── run_metric.py             # Benchmark generation script
├── evaluate.py               # Metric calculation script
├── script/
│   ├── flux_consist_edit.sh   # Consistent editing demo
│   └── flux_inconsist_edit.sh # Inconsistent editing demo
├── colorctrl/
│   ├── attention_control.py   # Cross-attention mechanisms
│   ├── solver.py              # Diffusion solvers
│   ├── utils.py               # Utility functions
│   └── global_var.py          # Global variables
├── evaluation/
│   ├── colorctrl_bench_mapping.json # ColorCtrl-Bench annotations
│   └── matric_calculator.py         # Evaluation metrics
└── assets/                   # Sample images

🙏 Acknowledgments

This codebase is built upon and inspired by several excellent open-source projects:

MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing
PnPInversion: Plug-and-Play diffusion features for text-driven image-to-image translation
UniEdit-Flow: UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models
DiTCtrl: DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation
ConsistEdit: ConsistEdit: Highly Consistent and Precise Training-free Visual Editing

We thank the authors of these works for their valuable contributions to the diffusion model editing community.

📖 Citation

If you find this work useful, please cite our paper:

@inproceedings{yin2026training,
  title={Training-Free Text-Guided Color Editing with Multi-Modal Diffusion Transformer},
  author={Yin, Zixin and Dai, Xili and Chen, Ling-Hao and Zhou, Deyu and Wang, Jianan and Wang, Duomin and Yu, Gang and Ni, Lionel M and Zhang, Lei and Shum, Heung-Yeung},
  booktitle={The Fourteenth International Conference on Learning Representations}
  year={2026}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ColorCtrl: Training-Free Text-Guided Color Editing with Multi-Modal Diffusion Transformer

🎯 Demo

🔧 Setup

Requirements

Model Preparation

🚀 Quick Start

Using Scripts

1. Consistent Editing (Change Color/Material)

2. Inconsistent Editing (Change Style/Object)

Manual Usage

FLUX

Stable Diffusion 3

CogVideo

Real Image Editing

🎭 Masking Strategies

Available Modes

1. No Mask (`--no_mask`)

2. ColorCtrl Mask (Default)

⚙️ Parameters

Common Parameters

Model-Specific Parameters

Real Image Editing (`run_real_flux.py`)

📊 Evaluation

Benchmark Generation

Metric Calculation

📁 Project Structure

🙏 Acknowledgments

📖 Citation

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
colorctrl		colorctrl
evaluation		evaluation
script		script
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
evaluate.py		evaluate.py
requirements.txt		requirements.txt
run_metric.py		run_metric.py
run_real_flux.py		run_real_flux.py
run_synthesis_cog.py		run_synthesis_cog.py
run_synthesis_flux.py		run_synthesis_flux.py
run_synthesis_sd3.py		run_synthesis_sd3.py

Folders and files

Latest commit

History

Repository files navigation

ColorCtrl: Training-Free Text-Guided Color Editing with Multi-Modal Diffusion Transformer

🎯 Demo

🔧 Setup

Requirements

Model Preparation

🚀 Quick Start

Using Scripts

1. Consistent Editing (Change Color/Material)

2. Inconsistent Editing (Change Style/Object)

Manual Usage

FLUX

Stable Diffusion 3

CogVideo

Real Image Editing

🎭 Masking Strategies

Available Modes

1. No Mask (--no_mask)

2. ColorCtrl Mask (Default)

⚙️ Parameters

Common Parameters

Model-Specific Parameters

Real Image Editing (run_real_flux.py)

📊 Evaluation

Benchmark Generation

Metric Calculation

📁 Project Structure

🙏 Acknowledgments

📖 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages

1. No Mask (`--no_mask`)

Real Image Editing (`run_real_flux.py`)