ICCV 2025
This repository contains the implementation of PVChat, a personalized video chat model that extends InternVideo2 with person-specific fine-tuning capabilities.
- Dataset Expansion
- Dataset Organization
- Dataset Expansion Code Optimization
- Fine-tuning Code Optimization
- HuggingFace Weights Update
- Two people training process
- three people training process
This project requires 7 different conda environments for various components. All environment requirement files are located in the environment/ folder.
Create and configure each environment using the following commands:
conda create -n consisid python=3.11.0
conda activate consisid
pip install -r environment/requirements_consisid_python_3.11.0.txtconda create -n deepfacelab python=3.7.16
conda activate deepfacelab
pip install -r environment/requirements_deepfacelab_python_3.7.16.txtconda create -n face_quality python=3.8.20
conda activate face_quality
pip install -r environment/requirements_face_quality_python_3.8.20.txtconda create -n LivePortrait python=3.10.6
conda activate LivePortrait
pip install -r environment/requirements_LivePortrait_python_3.10.6.txtconda create -n photomaker python=3.10.6
conda activate photomaker
pip install -r environment/requirements_photomaker_python_3.10.6.txtconda create -n pvchat python=3.10.0
conda activate pvchat
pip install -r environment/requirements_pvchat_python_3.10.0.txtconda create -n qwen python=3.10.0
conda activate qwen
pip install -r environment/requirements_qwen_python_3.10.0.txtgdown https://drive.google.com/file/d/1pr-oegxyhtLEr6Z0euEa3v4aGvm79UUZ/view?usp=sharingClone and configure the following repositories:
git clone https://github.com/PKU-YuanGroup/ConsisID.git ConsisID_tempgit clone https://github.com/KwaiVGI/LivePortrait.git LivePortrait_tempgit clone https://github.com/iperov/DeepFaceLab.git DeepFaceLab_tempAfter cloning and configuring each repository (including downloading required weights from HuggingFace), merge the provided code:
# For ConsisID
cp -rf consisid/* ConsisID_temp/
# Move the merged folder to the appropriate location if needed
# For LivePortrait
cp -rf LivePortrait/* LivePortrait_temp/
# Move the merged folder to the appropriate location if needed
# For DeepFaceLab
cp -rf Deepfacelab/* DeepFaceLab_temp/
# Move the merged folder to the appropriate location if neededNote: The cp -rf command will overwrite files with the same name and copy new files that don't exist in the destination.
Download the CelebV-HQ dataset following the instructions from the official repository:
- Visit: https://github.com/CelebV-HQ/CelebV-HQ
- Follow the download instructions provided in the repository
- Place the downloaded dataset in:
datasets/celebv-hq/
-
Download the original InternVideo2 weights and place them in:
PVChat/InternVideo2/multi_modality/Internvideo2_chat_8B_HD_finetune_REMOH/ -
Merge the PVChat modifications:
# Copy and overwrite existing files, add new files cp -rf PVChat/InternVideo2/multi_modality/Internvideo2_chat_8B_HD_PVChat/* \ PVChat/InternVideo2/multi_modality/Internvideo2_chat_8B_HD_finetune_REMOH/
To expand your dataset with person-specific videos:
-
Place your source video files in:
Deepfacelab/data_src/ -
Update the paths in the script:
# Edit the script to update paths to your local directory vim PVChat/InternVideo2/multi_modality/all_dataset_set_detail.sh -
Run the dataset expansion script:
cd PVChat/InternVideo2/multi_modality/ bash all_dataset_set_detail.sh
Refer this github https://github.com/rom1504/img2dataset
-
Update the training script parameters:
Edit
PVChat/InternVideo2/multi_modality/finetune_internvideo_REOMH_one_person2_stage.pywith the following parameters:--sks_name: Name of the target person (e.g., "john_doe")--model_path: Path to the model checkpoint--train_json: Path to training JSON file generated from dataset expansion--short_train_json: Path to short training JSON file--test_json: Path to test JSON file
-
Update the configuration file:
Edit
PVChat/InternVideo2/multi_modality/Internvideo2_chat_8B_HD_finetune_PVChat/config.json:{ ... "sks_name": "your_person_name", ... "model_config": {, "sks_name": "your_person_name", ...} }Note: There are two instances of
sks_namein the config file. Both must be updated with the same person name.
Run the fine-tuning script:
cd PVChat/InternVideo2/multi_modality/
python finetune_internvideo_REOMH_one_person2_stage.py \
--sks_name "your_person_name" \
--model_path "path/to/model" \
--train_json "path/to/train.json" \
--short_train_json "path/to/short_train.json" \
--test_json "path/to/test.json"PVChat/
├── PVChat/
│ ├── InternVideo2/
│ │ └── multi_modality/
│ │ ├── Internvideo2_chat_8B_HD_finetune_REMOH/
│ │ ├── Internvideo2_chat_8B_HD_PVChat/
│ │ ├── Internvideo2_chat_8B_HD_finetune_PVChat/
│ │ ├── all_dataset_set_detail.sh
│ │ └── finetune_internvideo_REOMH_one_person2_stage.py
│ ├── consisid/
│ ├── LivePortrait/
│ └── Deepfacelab/
│ └── data_src/
├── datasets/
│ └── celebv-hq/
├── environment/
│ ├── requirements_consisid_python_3.11.0.txt
│ ├── requirements_deepfacelab_python_3.7.16.txt
│ ├── requirements_face_quality_python_3.8.20.txt
│ ├── requirements_LivePortrait_python_3.10.6.txt
│ ├── requirements_photomaker_python_3.10.6.txt
│ ├── requirements_pvchat_python_3.10.0.txt
│ └── requirements_qwen_python_3.10.0.txt
└── README.md
- CUDA-compatible GPU with sufficient VRAM (recommended: 48GB+)
- Conda package manager
- Git
- Sufficient disk space for datasets and model weights
If you find our paper and/or code helpful, please consider citing :
@InProceedings{Shi_2025_ICCV,
author = {Shi, Yufei and Yan, Weilong and Xu, Gang and Li, Yumeng and Chen, Yucheng and Li, Zhenxi and Yu, Fei and Li, Ming and Yeo, Si Yong},
title = {PVChat: Personalized Video Chat with One-Shot Learning},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2025},
pages = {23321-23331}
}
@article{pvchat,
title={PVChat: Personalized Video Chat with One-Shot Learning},
author={Yufei Shi and Weilong Yan and Gang Xu and Yumeng Li and Yucheng Chen and Zhenxi Li and Fei Richard Yu and Ming Li and Si Yong Yeo},
year={2025},
journal={arXiv preprint arXiv:2503.17069},
}
Please refer to the individual licenses of the incorporated projects.