Skip to content
/ MGVQ Public

[Arxiv'25] MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization

Notifications You must be signed in to change notification settings

MKJia/MGVQ

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization

arXiv

Mingkai Jia1,2, Wei Yin2*§, Xiaotao Hu1,2, Jiaxin Guo3, Xiaoyang Guo2
Qian Zhang2, Xiao-Xiao Long4, Ping Tan1

HKUST1, Horizon Robotics2, CUHK3, NJU4
* Corresponding Author, § Project Leader

🚀News

  • [September 2025] Achieve SOTA at TokBench image reconstruction leaderboards: Beat VAEs (VA-VAE, SD-3.5, SD-XL, and FLUX.1-dev) on multiple resolutions(256p, 512p, and 1024p) on Text-Accuracy, Text-NED, and Face-Similarity metrics.
  • [August 2025] Achieve SOTA at paperwithcode leaderboards: Image Reconstruction on ImageNet and UHDBench.
  • [August 2025] Released Inference Code
  • [August 2025] Released model zoo.
  • [August 2025] Released dataset for ultra-high-definition image reconstruction evaluation. Our proposed super-resolution image reconstruction UHDBench dataset is released.
  • [July 2025] Released paper.

🔨TO DO LIST

  • Training code.
  • More demos.
  • Models & Evaluation code.
  • Huggingface models.
  • Release zero-shot reconstruction benchmarks.

🙈 Model Zoo

Model Downsample Groups Codebook Size Training Data Link
mgvq-f8c32-g4 8 4 32768 imagenet link
mgvq-f8c32-g8 8 8 16384 imagenet link
mgvq-f16c32-g4 16 4 32768 imagenet link
mgvq-f16c32-g8 16 8 16384 imagenet link
mgvq-f16c32-g4-mix 16 4 32768 mix link
mgvq-f32c32-g8-mix 32 8 16384 mix link

🔑 Quick Start

Installation

git clone https://github.com/MKJia/MGVQ.git
cd MGVQ
pip3 install requirements.txt

Download models

Download the pretrained models from our model zoo to your /path/to/your/ckpt.

Data Preparation

Try our UHDBench dataset on huggingface and download to your /path/to/your/dataset.

Evaluation on Reconstruction

Remember to change the paths of ckpt and dataset_root, and make sure you are evaluating the expected model on dataset.

cd evaluation
python3 eval_recon.sh

Generation Demo&Evaluation

You can download the pretrained GPT model for generation on huggingface, and test it with our mgvq-f16c32-g4 tokenizer model for demo image sampling. Remember to change the paths of gpt_ckpt and vq_ckpt.

cd evaluation
python3 demo_gen.sh

We also provide our .npz file on huggingface sampled by sample_c2i_ddp.py for evaluation.

cd evaluation
python3 evaluator.py /path/to/your/VIRTUAL_imagenet256_labeled.npz /path/to/your/GPT_XXL_300ep_topk_12.npz

🗄️Demos

  • 🔥 Qualitative reconstruction images with $16$ x downsampling on $2560$ x $1440$ UHDBench dataset.

  • 🔥 Qualitative class-to-image generation of Imagenet. The classes are dog(Golden Retriever and Husky), cliff, and bald eagle.

  • 🔥 Reconstruction evaluation on 256×256 ImageNet benchmark.

  • 🔥 Zero-shot reconstruction evaluation with a downsample ratio of 16 on 512×512 datasets.

  • 🔥 Zero-shot reconstruction evaluation with a downsample ratio of 16 on 2560×1440 datasets.
  • 🔥 Reconstruction evaluation on TokBench.
Method Type Factor T-ACC(small)↑ T-ACC(mean)↑ T-NED(small)↑ T-NED(mean)↑ F-Sim(small)↑ F-sim(mean)↑
FlexTok Discrete 1D 0.55 6.95 7.80 21.09 0.06 0.15
VQGAN Discrete 16 0.05 1.10 4.34 8.22 0.05 0.10
LlamaGen Discrete 16 0.16 4.28 5.41 14.77 0.07 0.15
OpenMagvit2 Discrete 16 0.80 10.58 9.59 27.59 0.08 0.20
VAR Discrete 16 1.24 15.74 10.89 34.19 0.10 0.23
VA-VAE Continuous 16 6.92 37.04 25.14 56.32 0.22 0.49
MGVQ Discrete 16 11.08 43.15 32.80 62.29 0.22 0.47
Method Type Factor T-ACC(small)↑ T-ACC(mean)↑ T-NED(small)↑ T-NED(mean)↑ F-Sim(small)↑ F-sim(mean)↑
LlamaGen Discrete 8 4.39 29.41 19.69 49.00 0.17 0.40
OpenMagvit2 Discrete 8 9.33 40.24 30.82 59.97 0.23 0.48
SD-3.5 Continuous 8 36.26 67.04 59.04 80.58 0.43 0.70
FLEX.1-dev Continuous 8 50.69 75.91 70.70 86.42 0.52 0.76
MGVQ Discrete 8 63.83 82.65 80.18 90.96 0.58 0.80

🗄️Demos

📌 Citation

If the paper and code from MGVQ help your research, we kindly ask you to give a citation to our paper ❤️. Additionally, if you appreciate our work and find this repository useful, giving it a star ⭐️ would be a wonderful way to support our work. Thank you very much.

@article{jia2025mgvq,
  title={MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization},
  author={Jia, Mingkai and Yin, Wei and Hu, Xiaotao and Guo, Jiaxin and Guo, Xiaoyang and Zhang, Qian and Long, Xiao-Xiao and Tan, Ping},
  journal={arXiv preprint arXiv:2507.07997},
  year={2025}
}

License

This repository is under the MIT License. For more license questions, please contact Mingkai Jia (mjiaab@connect.ust.hk) and Wei Yin (yvanwy@outlook.com).

About

[Arxiv'25] MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •