Mingkai Jia1,2, Wei Yin2*§, Xiaotao Hu1,2, Jiaxin Guo3, Xiaoyang Guo2
Qian Zhang2, Xiao-Xiao Long4, Ping Tan1
HKUST1, Horizon Robotics2, CUHK3, NJU4
* Corresponding Author, § Project Leader
[September 2025]
Achieve SOTA at TokBench image reconstruction leaderboards: Beat VAEs (VA-VAE, SD-3.5, SD-XL, and FLUX.1-dev) on multiple resolutions(256p, 512p, and 1024p) on Text-Accuracy, Text-NED, and Face-Similarity metrics.[August 2025]
Achieve SOTA at paperwithcode leaderboards: Image Reconstruction on ImageNet and UHDBench.[August 2025]
Released Inference Code[August 2025]
Released model zoo.[August 2025]
Released dataset for ultra-high-definition image reconstruction evaluation. Our proposed super-resolution image reconstruction UHDBench dataset is released.[July 2025]
Released paper.
- Training code.
- More demos.
- Models & Evaluation code.
- Huggingface models.
- Release zero-shot reconstruction benchmarks.
Model | Downsample | Groups | Codebook Size | Training Data | Link |
---|---|---|---|---|---|
mgvq-f8c32-g4 | 8 | 4 | 32768 | imagenet | link |
mgvq-f8c32-g8 | 8 | 8 | 16384 | imagenet | link |
mgvq-f16c32-g4 | 16 | 4 | 32768 | imagenet | link |
mgvq-f16c32-g8 | 16 | 8 | 16384 | imagenet | link |
mgvq-f16c32-g4-mix | 16 | 4 | 32768 | mix | link |
mgvq-f32c32-g8-mix | 32 | 8 | 16384 | mix | link |
git clone https://github.com/MKJia/MGVQ.git
cd MGVQ
pip3 install requirements.txt
Download the pretrained models from our model zoo to your /path/to/your/ckpt
.
Try our UHDBench dataset on huggingface and download to your /path/to/your/dataset
.
Remember to change the paths of ckpt
and dataset_root
, and make sure you are evaluating the expected model
on dataset
.
cd evaluation
python3 eval_recon.sh
You can download the pretrained GPT model for generation on huggingface, and test it with our mgvq-f16c32-g4
tokenizer model for demo image sampling. Remember to change the paths of gpt_ckpt
and vq_ckpt
.
cd evaluation
python3 demo_gen.sh
We also provide our .npz file on huggingface sampled by sample_c2i_ddp.py
for evaluation.
cd evaluation
python3 evaluator.py /path/to/your/VIRTUAL_imagenet256_labeled.npz /path/to/your/GPT_XXL_300ep_topk_12.npz
- 🔥 Qualitative reconstruction images with
$16$ x downsampling on$2560$ x$1440$ UHDBench dataset.
- 🔥 Qualitative class-to-image generation of Imagenet. The classes are dog(Golden Retriever and Husky), cliff, and bald eagle.
- 🔥 Reconstruction evaluation on 256×256 ImageNet benchmark.
- 🔥 Zero-shot reconstruction evaluation with a downsample ratio of 16 on 512×512 datasets.
- 🔥 Zero-shot reconstruction evaluation with a downsample ratio of 16 on 2560×1440 datasets.
- 🔥 Reconstruction evaluation on TokBench.
Method | Type | Factor | T-ACC(small)↑ | T-ACC(mean)↑ | T-NED(small)↑ | T-NED(mean)↑ | F-Sim(small)↑ | F-sim(mean)↑ |
---|---|---|---|---|---|---|---|---|
FlexTok | Discrete | 1D | 0.55 | 6.95 | 7.80 | 21.09 | 0.06 | 0.15 |
VQGAN | Discrete | 16 | 0.05 | 1.10 | 4.34 | 8.22 | 0.05 | 0.10 |
LlamaGen | Discrete | 16 | 0.16 | 4.28 | 5.41 | 14.77 | 0.07 | 0.15 |
OpenMagvit2 | Discrete | 16 | 0.80 | 10.58 | 9.59 | 27.59 | 0.08 | 0.20 |
VAR | Discrete | 16 | 1.24 | 15.74 | 10.89 | 34.19 | 0.10 | 0.23 |
VA-VAE | Continuous | 16 | 6.92 | 37.04 | 25.14 | 56.32 | 0.22 | 0.49 |
MGVQ | Discrete | 16 | 11.08 | 43.15 | 32.80 | 62.29 | 0.22 | 0.47 |
Method | Type | Factor | T-ACC(small)↑ | T-ACC(mean)↑ | T-NED(small)↑ | T-NED(mean)↑ | F-Sim(small)↑ | F-sim(mean)↑ |
---|---|---|---|---|---|---|---|---|
LlamaGen | Discrete | 8 | 4.39 | 29.41 | 19.69 | 49.00 | 0.17 | 0.40 |
OpenMagvit2 | Discrete | 8 | 9.33 | 40.24 | 30.82 | 59.97 | 0.23 | 0.48 |
SD-3.5 | Continuous | 8 | 36.26 | 67.04 | 59.04 | 80.58 | 0.43 | 0.70 |
FLEX.1-dev | Continuous | 8 | 50.69 | 75.91 | 70.70 | 86.42 | 0.52 | 0.76 |
MGVQ | Discrete | 8 | 63.83 | 82.65 | 80.18 | 90.96 | 0.58 | 0.80 |
If the paper and code from MGVQ
help your research, we kindly ask you to give a citation to our paper ❤️. Additionally, if you appreciate our work and find this repository useful, giving it a star ⭐️ would be a wonderful way to support our work. Thank you very much.
@article{jia2025mgvq,
title={MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization},
author={Jia, Mingkai and Yin, Wei and Hu, Xiaotao and Guo, Jiaxin and Guo, Xiaoyang and Zhang, Qian and Long, Xiao-Xiao and Tan, Ping},
journal={arXiv preprint arXiv:2507.07997},
year={2025}
}
This repository is under the MIT License. For more license questions, please contact Mingkai Jia (mjiaab@connect.ust.hk) and Wei Yin (yvanwy@outlook.com).