1ByteDance, 2Shanghai Jiao Tong University, 3National University of Singapore, 4The Hong Kong University of Science and Technology (Guangzhou)
*Equal Contribution
EditMGT is a novel framework that leverages Masked Generative Transformers for advanced image editing tasks. Our approach enables precise and controllable image modifications while preserving original content integrity.
First, clone the repository and navigate to the project root:
git clone https://github.com/weichow23/editmgt
cd editmgt# Create and activate conda environment
conda create --name editmgt python=3.9.2
conda activate editmgt
# Optional: Install system dependencies
sudo apt-get install libgl1-mesa-glx libglib2.0-0 -y
# Install Python dependencies
pip3 install git+https://github.com/openai/CLIP
pip3 install -r requirements.txtFor training, you can try using --instance_dataset HuggingFaceDataset with the AnyEdit dataset to get the training code running. After verifying functionality, you can switch to your own dataset.
The train/train_edit.yaml file controls training behavior. Key settings include:
| Parameter | Description | Default |
|---|---|---|
mixed_precision |
Training Precision | no for float32; bf16 for bfloat16 |
resume_from_checkpoint |
Resume Training | direct local path or latest |
wandb_id |
Weights & Biases ID | keep empty or id number |
pretrained_model_name_or_path |
Base Model | MeissonFlow/Meissonic |
output_dir |
Checkpoint Save Location | ./runs/editmgt |
train_batch_size |
Batch Size per GPU | 4 |
gradient_accumulation_steps |
Steps before Weight Update | 8 |
learning_rate |
Initial Learning Rate | 1e-4 |
max_grad_norm |
Gradient Clipping Value | 10 |
text_encoder_architecture |
Text Encoder Model | CLIP_Gemma2 |
resolution |
Image Resolution | 1024 |
lr_scheduler |
Learning Rate Schedule | constant |
max_train_steps |
Total Training Steps | 500000 |
checkpointing_steps |
Save Frequency | 200 |
logging_steps |
Log Metrics Frequency | 10 |
To use Weights & Biases for tracking the training process, add these to your environment:
echo 'export WANDB_API_KEY=<YOUR WANDB API>' >> ~/.bashrc
echo 'export WANDB_ENTITY=<YOUR WANDB ENTITY>' >> ~/.bashrc
echo 'export hf_token=<YOUR HF TOKEN>' >> ~/.bashrc
source ~/.bashrcTo use LoRA (reduces memory usage by ~70%), add the following to the train.sh file under train:
--use_lora \
--lora_r 32 \
--lora_alpha 128Here, lora_r and lora_alpha represent the rank and alpha parameters of LoRA, respectively.
Use the script to start training. Specify the config file and enable LoRA if needed:
bash train/train.shWe provide a standard example script:
python3 infer.pyWe also provide evaluation scripts for GEditBench. Refer to the Official Repo for implementation details.
You'll need to install additional dependencies beyond our requirements.txt:
pip3 install megfile==4.1.4Then use the scirpt:
python3 eval/geditbench/infer.pyGenerate and organize your images in the following directory structure:
results/
├── {method_name}/
│ └── fullset/
│ └── {edit_task}/
│ └── en/ # English instructions
│ ├── key1.png
│ ├── key2.png
│ └── ...
Run the inference script:
PYTHON_PATH='./' python3 eval/geditbench/infer.pyFor GPT-4.1 evaluation, set up your API keys in eval/geditbench/rate.py at lines L103 and L105 for GPT4.1 access, then run:
PYTHON_PATH='./' python3 eval/geditbench/rate.py --model_name editmgt --save_dir eval/geditbench/score_dir --backbone gpt4.1 --edited_images_dir eval/geditbench/results/ --instruction_language enRun the analysis script to get scores for semantics, quality, and overall performance:
PYTHON_PATH='./' python3 eval/geditbench/stat.py --model_name editmgt --save_path eval/geditbench/score_dir --backbone gpt4o --language enThis will output scores broken down by edit category and provide aggregate metrics.
Note: GEditBench scores can fluctuate significantly due to random generation results and GPT version differences. Fluctuations around our reported scores are normal. For AnyBench, EmuEdit, and MagicBrush, we recommend reducing guidance scale and steps for the first two, and using our mask strategy for the latter.
This project is licensed under the CCBY-4.0 License. See LICENSE for details.
@article{chow2025editmgt,
title={EditMGT: Unleashing Potentials of Masked Generative Transformers in Image Editing},
author={Chow, Wei and Li, Linfeng and Kong, Lingdong and Li, Zefeng and Xu, Qi and Song, Hang and Ye, Tian and Wang, Xian and Bai, Jinbin and Xu, Shilin and others},
journal={arXiv preprint arXiv:2512.11715},
year={2025}
}
We thank all contributors and the research community for their valuable feedback and support.