Skip to content

jzhws/VITAL-Series

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[CVPR 2026] VITAL-Series

Notice (2026.2.21): This work has been accepted to CVPR 2026.

Official implementation of VITAL: Vision-Encoder-centered pretraining for LMMs in visual quality assessment.

VITAL teaser


✨ Overview

VITAL-Series contains two major components:

  • VITAL-LMM: training/evaluation code for VITAL main models.
  • VITAL-linear-probe: visual encoder extension workflows (e.g., linear-probe and lightweight downstream adaptation).

⚙️ Environment Setup

Use the provided environment file:

conda env create -f environment.yml

If needed, adjust CUDA/PyTorch versions according to your machine.


📥 Model Download & Placement

  1. Download VITAL-Assistant-8B, VITAL-Base-8B, and VITAL-Vision-Encoder-300M.
  2. Place LMM-related models under VITAL-LMM.
  3. For visual-encoder extension experiments (e.g., linear-probe), place VITAL-Vision-Encoder-300M under VITAL-linear-probe.
  4. Additional zero/warm-up series models are available on Hugging Face (see Model Zoo).

🚀 VITAL Main Models (LMM)

cd VITAL-LMM

🧪 Testing

cd VITAL-LMM/test

  1. Edit JSON configs in shell/eval/eval_data:

    • Update root and annotation to your image/video paths and annotation files.
  2. Run evaluation scripts:

For quality scoring:
bash shell/eval/evaluate_image.sh
bash shell/eval/evaluate_video.sh

For text generation:
bash shell/eval/evaluate_qbench.sh
bash shell/eval/evaluate_qbench_video_single_dev.sh
  1. Evaluation entry scripts are in internvl/eval:
    • Default scoring: scoring.py
    • Faster video scoring: scoring_less_token.py

If you want to use scoring_less_token.py, modify line 31 in shell/eval/evaluate_custom_scoring.sh accordingly.

🏋️ Training

cd VITAL-LMM/train

Use scripts in training_shell (update data/model paths before running):

bash shell/pretrain.sh
bash shell/warm_up.sh

👁️ VITAL Linear Probe (Visual Encoder Extension)

cd VITAL-linear-probe

This module supports training/testing with non-LLM heads (e.g., linear probes) on top of VITAL-Vision-Encoder.

🏋️ Training

bash shell/probe_finetune.sh

🧪 Testing

bash shell/evaluate_video.sh

Please update file paths in scripts for your local setup.


📦 Model Zoo

VITAL Main Models

Vision Encoder & Extensions


📚 Citation

If you use this project, please cite:

@article{jia2025vital,
  title={VITAL: Vision-Encoder-centered Pre-training for LMMs in Visual Quality Assessment},
  author={Jia, Ziheng and Cao, Linhan and Han, Jinliang and Zhang, Zicheng and Qian, Jiaying and Wang, Jiarui and Chen, Zijian and Zhai, Guangtao and Min, Xiongkuo},
  journal={arXiv preprint arXiv:2511.17962},
  year={2025}
}

@inproceedings{jia2025vqa2,
  title={Vqa2: visual question answering for video quality assessment},
  author={Jia, Ziheng and Zhang, Zicheng and Qian, Jiaying and Wu, Haoning and Sun, Wei and Li, Chunyi and Liu, Xiaohong and Lin, Weisi and Zhai, Guangtao and Min, Xiongkuo},
  booktitle={Proceedings of the 33rd ACM International Conference on Multimedia},
  pages={6751--6760},
  year={2025}
}

@inproceedings{zhang2025q,
  title={Q-Bench-Video: Benchmark the Video Quality Understanding of LMMs},
  author={Zhang, Zicheng and Jia, Ziheng and Wu, Haoning and Li, Chunyi and Chen, Zijian and Zhou, Yingjie and Sun, Wei and Liu, Xiaohong and Min, Xiongkuo and Lin, Weisi and others},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={3229--3239},
  year={2025}
}

For custom environments, adjust file paths and parameters as needed. If you encounter issues, feel free to open an issue in this repository.

About

[CVPR 2026] Official implementation of VITAL: Vision-Encoder centered pretraining for LMMs in visual quality assessment.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors