RDVQ: Differentiable Vector Quantization for Rate-Distortion Optimization of Generative Image Compression
Shiyin Jiang Β· Wei Long Β· Minghao Han Β· Zhenghao Chen Β· Ce Zhu Β· Shuhang Gu
CVL Lab @ University of Electronic Science and Technology of China
- Jun 12, 2026 β Training code and configurations are released.
RDVQ is a VQ-based generative image compression framework for efficient and controllable ultra-low-bitrate image compression.
Conventional VQ-VAE learns powerful discrete representations, but its non-differentiable nearest-neighbor lookup decouples representation learning from probability modeling. The entropy model can only predict the resulting code indices, while its rate feedback cannot effectively optimize the encoder. This limits true joint rate-distortion optimization.
RDVQ addresses this issue with a simple relaxed lookup mechanism, which builds a differentiable path between encoder features, discrete code indices, and the autoregressive entropy model. As a result, the rate loss can directly guide the encoder to learn more compressible representations, transforming VQ-VAE from a representation learning framework into a practical learned image codec.
-
Differentiable VQ-based R-D optimization Enables joint distortion and rate minimization through relaxed lookup.
-
Multi-scale shared-codebook latents Provide compact and expressive discrete representations across scales.
-
Masked Transformer entropy model Estimates accurate probabilities for effective entropy coding.
-
Test-time rate control Supports bitrate adjustment via prefix transmission and autoregressive completion.
Despite its lightweight design, RDVQ achieves strong perceptual compression performance at ultra-low bitrates while requiring only a small fraction of the parameters used by large generative compression models.
conda create -n RDVQ python=3.10 -y
conda activate RDVQ
git clone https://github.com/CVL-UESTC/RDVQ.git
cd RDVQ
pip install "torch>=2.1.0" torchvision --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txtNote: The default real-bitstream path uses the causal top-k tensor-rANS codec and JIT-builds a small C++17 extension on first use. Please make sure a C++17 compiler is available when running
test_Real.sh.
RDVQ expects an ImageFolder-style directory where all images are directly placed under one folder:
/path/to/images/
image_0001.png
image_0002.jpg
...
Nested subdirectories are not scanned by the testing scripts.
We provide the pretrained checkpoints and testing weights at Huggingface.
RDVQ provides two testing scripts:
| Script | Bitrate Type | Description |
|---|---|---|
test.sh |
Estimated bitrate | Reports entropy-estimated bitrate such as cd_bpp. |
test_Real.sh |
Real bitstream bitrate | Reports actual payload bitrate such as cd_bpp_real. |
By default, the evaluator reports:
bpp, lpips, dists, musiq, clipiqa, niqe, psnr, msssim
You can override the metric list with TEST_METRICS.
TEST_CKPT_PATH=/path/to/checkpoint \
TEST_IMAGE_DIR=/path/to/kodak \
TEST_DATASET=kodak \
bash test.shTEST_CKPT_PATH=/path/to/checkpoint \
TEST_IMAGE_DIR=/path/to/kodak \
TEST_DATASET=kodak \
bash test_Real.shFor DIV2K and CLIC, please provide FID_REF_ROOT:
TEST_CKPT_PATH=/path/to/checkpoint \
TEST_IMAGE_DIR=/path/to/DIV2K_valid_HR \
TEST_DATASET=div2k \
FID_REF_ROOT=/path/to/fid_refs \
bash test.shFor CLIC:
TEST_CKPT_PATH=/path/to/checkpoint \
TEST_IMAGE_DIR=/path/to/CLIC_valid \
TEST_DATASET=clic \
FID_REF_ROOT=/path/to/fid_refs \
bash test.shThe evaluator uses:
<FID_REF_ROOT>/<TEST_DATASET>_256teles
If the reference directory is missing or empty, it will be generated automatically from the original images and reused in later runs.
test_Real.sh supports test-time rate control through prefix transmission and autoregressive completion by changing TEST_TRANSFER_SLICES:
TEST_CKPT_PATH=/path/to/checkpoint \
TEST_IMAGE_DIR=/path/to/kodak \
TEST_DATASET=kodak \
TEST_TRANSFER_SLICES=4 \
bash test_Real.shTEST_MAX_IMAGES=1
TEST_METRICS=bpp,psnr,msssim
DISABLE_FID=1
SAVE_IMAGES=0| Variable | Required | Description |
|---|---|---|
TEST_CKPT_PATH |
Yes | Path to the checkpoint. |
TEST_IMAGE_DIR |
Yes | Image folder for evaluation. |
TEST_DATASET |
Yes | Dataset label: kodak, div2k, or clic. |
FID_REF_ROOT |
For DIV2K/CLIC | Root directory for FID/KID reference tiles. |
FID_REF_DIR |
Optional | Manually specified reference tile directory. |
TEST_METRICS |
Optional | Evaluation metric list. |
TEST_TRANSFER_SLICES |
Optional | Number of transmitted latent slices for real-bitstream rate control. |
TEST_TOPK |
Optional | Top-k/escape entropy width for test_Real.sh. Default: 1024. |
TEST_MAX_IMAGES |
Optional | Maximum number of images for debugging. |
DISABLE_FID |
Optional | Disable FID/KID computation. |
SAVE_IMAGES |
Optional | Whether to save reconstructed images. |
<checkpoint_stem>/forward/<dataset_name>/
<checkpoint_stem>/Real/transfer_slices_<N>/<dataset_name>/
Training follows a multi-stage pipeline. Please refer to TRAINING_STAGES.md for the full recipe.
bash scripts/tokenizer/train_vq.sh \
--data-path /path/to/train/images \
--image-size 256 \
--vq-model VQ-16-32-64_quant_once \
--dataset openimage \
--global-batch-size 32 \
--results-dir ./results/s1_tokenizer \
--codebook-size 4096 \
--codebook-embed-dim 32 \
--entropy-loss-ratio 0.0 \
--lr 1e-4 \
--disc-lr 1e-4 \
--wo-attnIf you find RDVQ helpful for your research, please cite:
@inproceedings{jiang2026rdvq,
title = {Differentiable Vector Quantization for Rate-Distortion Optimization
of Generative Image Compression},
author = {Jiang, Shiyin and Long, Wei and Han, Minghao and Chen, Zhenghao
and Zhu, Ce and Gu, Shuhang},
booktitle = {CVPR},
year = {2026},
}For questions or feedback, please contact:
Shiyin Jiang π§ shiyin.jsy@gmail.com