✨ EditMGT: Unleashing the Potential of Masked Generative Transformer in Image Editing ✨

Wei Chow^1,*, Linfeng Li^1,*, Lingdong Kong¹, Zefeng Li¹, Qi Xu¹, Hang Song¹, Tian Ye⁴, Xian Wang¹, Jinbin Bai³, Shilin Xu¹, Xiangtai Li¹, Junting Pan¹, Shaoteng Liu¹, Ran Zhou¹, Tianshu Yang¹, Songhua Liu²

¹ByteDance, ²Shanghai Jiao Tong University, ³National University of Singapore, ⁴The Hong Kong University of Science and Technology (Guangzhou)

*Equal Contribution

🚀 Project Introduction

EditMGT is a novel framework that leverages Masked Generative Transformers for advanced image editing tasks. Our approach enables precise and controllable image modifications while preserving original content integrity.

⚡ Quick Start

First, clone the repository and navigate to the project root:

git clone https://github.com/weichow23/editmgt
cd editmgt

🔧 Environment Setup

# Create and activate conda environment
conda create --name editmgt python=3.9.2
conda activate editmgt

# Optional: Install system dependencies
sudo apt-get install libgl1-mesa-glx libglib2.0-0 -y

# Install Python dependencies
pip3 install git+https://github.com/openai/CLIP
pip3 install -r requirements.txt

⚠️ Note: If you encounter any strange environment library errors, please refer to Issues to find the correct version that might fix the error.

🎯 Training

For training, you can try using --instance_dataset HuggingFaceDataset with the AnyEdit dataset to get the training code running. After verifying functionality, you can switch to your own dataset.

📝 Configuration

The train/train_edit.yaml file controls training behavior. Key settings include:

Parameter	Description	Default
`mixed_precision`	Training Precision	`no` for float32; `bf16` for bfloat16
`resume_from_checkpoint`	Resume Training	`direct local path` or `latest`
`wandb_id`	Weights & Biases ID	keep empty or id number
`pretrained_model_name_or_path`	Base Model	`MeissonFlow/Meissonic`
`output_dir`	Checkpoint Save Location	`./runs/editmgt`
`train_batch_size`	Batch Size per GPU	`4`
`gradient_accumulation_steps`	Steps before Weight Update	`8`
`learning_rate`	Initial Learning Rate	`1e-4`
`max_grad_norm`	Gradient Clipping Value	`10`
`text_encoder_architecture`	Text Encoder Model	`CLIP_Gemma2`
`resolution`	Image Resolution	`1024`
`lr_scheduler`	Learning Rate Schedule	`constant`
`max_train_steps`	Total Training Steps	`500000`
`checkpointing_steps`	Save Frequency	`200`
`logging_steps`	Log Metrics Frequency	`10`

To use Weights & Biases for tracking the training process, add these to your environment:

echo 'export WANDB_API_KEY=<YOUR WANDB API>' >> ~/.bashrc
echo 'export WANDB_ENTITY=<YOUR WANDB ENTITY>' >> ~/.bashrc
echo 'export hf_token=<YOUR HF TOKEN>' >> ~/.bashrc
source ~/.bashrc

🧩 Enabling LoRA

To use LoRA (reduces memory usage by ~70%), add the following to the train.sh file under train:

--use_lora \
--lora_r 32 \
--lora_alpha 128

Here, lora_r and lora_alpha represent the rank and alpha parameters of LoRA, respectively.

🚗 Run Training

Use the script to start training. Specify the config file and enable LoRA if needed:

bash train/train.sh

⚠️ Important: Since Meissonic has not released bf16 weights, fp16 weights may cause training instability. Therefore, we use fp32 for training (although this will be slower).

🔍 Inference

We provide a standard example script:

python3 infer.py

🃏 GEditBench-EN

We also provide evaluation scripts for GEditBench. Refer to the Official Repo for implementation details.

You'll need to install additional dependencies beyond our requirements.txt:

pip3 install megfile==4.1.4

Then use the scirpt:

python3 eval/geditbench/infer.py

Generate and organize your images in the following directory structure:

results/
├── {method_name}/
│   └── fullset/
│       └── {edit_task}/
│           └── en/  # English instructions
│               ├── key1.png
│               ├── key2.png
│               └── ...

Run the inference script:

PYTHON_PATH='./' python3 eval/geditbench/infer.py

For GPT-4.1 evaluation, set up your API keys in eval/geditbench/rate.py at lines L103 and L105 for GPT4.1 access, then run:

PYTHON_PATH='./' python3 eval/geditbench/rate.py --model_name editmgt --save_dir eval/geditbench/score_dir --backbone gpt4.1 --edited_images_dir eval/geditbench/results/ --instruction_language en

Run the analysis script to get scores for semantics, quality, and overall performance:

PYTHON_PATH='./' python3 eval/geditbench/stat.py --model_name editmgt --save_path eval/geditbench/score_dir --backbone gpt4o --language en

This will output scores broken down by edit category and provide aggregate metrics.

Note: GEditBench scores can fluctuate significantly due to random generation results and GPT version differences. Fluctuations around our reported scores are normal. For AnyBench, EmuEdit, and MagicBrush, we recommend reducing guidance scale and steps for the first two, and using our mask strategy for the latter.

📄 License

This project is licensed under the CCBY-4.0 License. See LICENSE for details.

📑 Citation

@article{chow2025editmgt,
  title={EditMGT: Unleashing Potentials of Masked Generative Transformers in Image Editing},
  author={Chow, Wei and Li, Linfeng and Kong, Lingdong and Li, Zefeng and Xu, Qi and Song, Hang and Ye, Tian and Wang, Xian and Bai, Jinbin and Xu, Shilin and others},
  journal={arXiv preprint arXiv:2512.11715},
  year={2025}
}

🙏 Acknowledgements

We thank all contributors and the research community for their valuable feedback and support.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
eval/geditbench		eval/geditbench
src		src
train		train
LICENSE		LICENSE
README.md		README.md
infer.py		infer.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

✨ EditMGT: Unleashing the Potential of Masked Generative Transformer in Image Editing ✨

🚀 Project Introduction

⚡ Quick Start

🔧 Environment Setup

🎯 Training

📝 Configuration

🧩 Enabling LoRA

🚗 Run Training

🔍 Inference

🃏 GEditBench-EN

📄 License

📑 Citation

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

weichow23/EditMGT

Folders and files

Latest commit

History

Repository files navigation

✨ EditMGT: Unleashing the Potential of Masked Generative Transformer in Image Editing ✨

🚀 Project Introduction

⚡ Quick Start

🔧 Environment Setup

🎯 Training

📝 Configuration

🧩 Enabling LoRA

🚗 Run Training

🔍 Inference

🃏 GEditBench-EN

📄 License

📑 Citation

🙏 Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages