[CVPR 2026] VITAL-Series

Notice (2026.2.21): This work has been accepted to CVPR 2026.

Official implementation of VITAL: Vision-Encoder-centered pretraining for LMMs in visual quality assessment.

✨ Overview

VITAL-Series contains two major components:

VITAL-LMM: training/evaluation code for VITAL main models.
VITAL-linear-probe: visual encoder extension workflows (e.g., linear-probe and lightweight downstream adaptation).

⚙️ Environment Setup

Use the provided environment file:

conda env create -f environment.yml

If needed, adjust CUDA/PyTorch versions according to your machine.

📥 Model Download & Placement

Download VITAL-Assistant-8B, VITAL-Base-8B, and VITAL-Vision-Encoder-300M.
Place LMM-related models under VITAL-LMM.
For visual-encoder extension experiments (e.g., linear-probe), place VITAL-Vision-Encoder-300M under VITAL-linear-probe.
Additional zero/warm-up series models are available on Hugging Face (see Model Zoo).

🚀 VITAL Main Models (LMM)

cd VITAL-LMM

🧪 Testing

cd VITAL-LMM/test

Edit JSON configs in shell/eval/eval_data:
- Update root and annotation to your image/video paths and annotation files.
Run evaluation scripts:

For quality scoring:
bash shell/eval/evaluate_image.sh
bash shell/eval/evaluate_video.sh

For text generation:
bash shell/eval/evaluate_qbench.sh
bash shell/eval/evaluate_qbench_video_single_dev.sh

Evaluation entry scripts are in internvl/eval:
- Default scoring: scoring.py
- Faster video scoring: scoring_less_token.py

If you want to use scoring_less_token.py, modify line 31 in shell/eval/evaluate_custom_scoring.sh accordingly.

🏋️ Training

cd VITAL-LMM/train

Use scripts in training_shell (update data/model paths before running):

bash shell/pretrain.sh
bash shell/warm_up.sh

👁️ VITAL Linear Probe (Visual Encoder Extension)

cd VITAL-linear-probe

This module supports training/testing with non-LLM heads (e.g., linear probes) on top of VITAL-Vision-Encoder.

🏋️ Training

bash shell/probe_finetune.sh

🧪 Testing

bash shell/evaluate_video.sh

Please update file paths in scripts for your local setup.

📦 Model Zoo

VITAL Main Models

VITAL-Base-8B: https://huggingface.co/JZHWS/VITAL-Base-8B
VITAL-Assistant-8B: https://huggingface.co/JZHWS/VITAL-Assistant-8B
VITAL-Warm-up-1B: https://huggingface.co/JZHWS/VITAL-Warm-up-1B
VITAL-Warm-up-2B: https://huggingface.co/JZHWS/VITAL-Warm-up-2B
VITAL-Warm-up-14B: https://huggingface.co/JZHWS/VITAL-Warm-up-14B

Vision Encoder & Extensions

VITAL-Vision-Encoder-300M: https://huggingface.co/JZHWS/VITAL-Vision-Encoder-300M
VITAL-Linear-Probe: https://huggingface.co/JZHWS/VITAL-Linear-Probe

📚 Citation

If you use this project, please cite:

@article{jia2025vital,
  title={VITAL: Vision-Encoder-centered Pre-training for LMMs in Visual Quality Assessment},
  author={Jia, Ziheng and Cao, Linhan and Han, Jinliang and Zhang, Zicheng and Qian, Jiaying and Wang, Jiarui and Chen, Zijian and Zhai, Guangtao and Min, Xiongkuo},
  journal={arXiv preprint arXiv:2511.17962},
  year={2025}
}

@inproceedings{jia2025vqa2,
  title={Vqa2: visual question answering for video quality assessment},
  author={Jia, Ziheng and Zhang, Zicheng and Qian, Jiaying and Wu, Haoning and Sun, Wei and Li, Chunyi and Liu, Xiaohong and Lin, Weisi and Zhai, Guangtao and Min, Xiongkuo},
  booktitle={Proceedings of the 33rd ACM International Conference on Multimedia},
  pages={6751--6760},
  year={2025}
}

@inproceedings{zhang2025q,
  title={Q-Bench-Video: Benchmark the Video Quality Understanding of LMMs},
  author={Zhang, Zicheng and Jia, Ziheng and Wu, Haoning and Li, Chunyi and Chen, Zijian and Zhou, Yingjie and Sun, Wei and Liu, Xiaohong and Min, Xiongkuo and Lin, Weisi and others},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={3229--3239},
  year={2025}
}

For custom environments, adjust file paths and parameters as needed. If you encounter issues, feel free to open an issue in this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
VITAL-LMM		VITAL-LMM
VITAL-linear-probe		VITAL-linear-probe
scripts		scripts
README.md		README.md
environment.yml		environment.yml
teaser.jpg		teaser.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[CVPR 2026] VITAL-Series

✨ Overview

⚙️ Environment Setup

📥 Model Download & Placement

🚀 VITAL Main Models (LMM)

🧪 Testing

🏋️ Training

👁️ VITAL Linear Probe (Visual Encoder Extension)

🏋️ Training

🧪 Testing

📦 Model Zoo

VITAL Main Models

Vision Encoder & Extensions

📚 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

[CVPR 2026] VITAL-Series

✨ Overview

⚙️ Environment Setup

📥 Model Download & Placement

🚀 VITAL Main Models (LMM)

🧪 Testing

🏋️ Training

👁️ VITAL Linear Probe (Visual Encoder Extension)

🏋️ Training

🧪 Testing

📦 Model Zoo

VITAL Main Models

Vision Encoder & Extensions

📚 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages