PVChat: Personalized Video Chat with One-Shot Learning

ICCV 2025

This repository contains the implementation of PVChat, a personalized video chat model that extends InternVideo2 with person-specific fine-tuning capabilities.

To-Do List

1. Dataset Expansion

1.1 Environment Setup

This project requires 7 different conda environments for various components. All environment requirement files are located in the environment/ folder.

Create and configure each environment using the following commands:

1. ConsisID Environment

conda create -n consisid python=3.11.0
conda activate consisid
pip install -r environment/requirements_consisid_python_3.11.0.txt

2. DeepFaceLab Environment

conda create -n deepfacelab python=3.7.16
conda activate deepfacelab
pip install -r environment/requirements_deepfacelab_python_3.7.16.txt

3. Face Quality Environment

conda create -n face_quality python=3.8.20
conda activate face_quality
pip install -r environment/requirements_face_quality_python_3.8.20.txt

4. LivePortrait Environment

conda create -n LivePortrait python=3.10.6
conda activate LivePortrait
pip install -r environment/requirements_LivePortrait_python_3.10.6.txt

5. PhotoMaker Environment

conda create -n photomaker python=3.10.6
conda activate photomaker
pip install -r environment/requirements_photomaker_python_3.10.6.txt

6. PVChat Environment

conda create -n pvchat python=3.10.0
conda activate pvchat
pip install -r environment/requirements_pvchat_python_3.10.0.txt

7. Qwen Environment

conda create -n qwen python=3.10.0
conda activate qwen
pip install -r environment/requirements_qwen_python_3.10.0.txt

8. Download our datasets

gdown https://drive.google.com/file/d/1pr-oegxyhtLEr6Z0euEa3v4aGvm79UUZ/view?usp=sharing

1.2 Code Configuration

Clone and configure the following repositories:

ConsisID

git clone https://github.com/PKU-YuanGroup/ConsisID.git ConsisID_temp

LivePortrait

git clone https://github.com/KwaiVGI/LivePortrait.git LivePortrait_temp

DeepFaceLab

git clone https://github.com/iperov/DeepFaceLab.git DeepFaceLab_temp

After cloning and configuring each repository (including downloading required weights from HuggingFace), merge the provided code:

# For ConsisID
cp -rf consisid/* ConsisID_temp/
# Move the merged folder to the appropriate location if needed

# For LivePortrait
cp -rf LivePortrait/* LivePortrait_temp/
# Move the merged folder to the appropriate location if needed

# For DeepFaceLab
cp -rf Deepfacelab/* DeepFaceLab_temp/
# Move the merged folder to the appropriate location if needed

Note: The cp -rf command will overwrite files with the same name and copy new files that don't exist in the destination.

1.3 CelebV-HQ Dataset

Download the CelebV-HQ dataset following the instructions from the official repository:

Visit: https://github.com/CelebV-HQ/CelebV-HQ
Follow the download instructions provided in the repository
Place the downloaded dataset in:
```
datasets/celebv-hq/
```

1.4 InternVideo2 Weights

Download the original InternVideo2 weights and place them in:

PVChat/InternVideo2/multi_modality/Internvideo2_chat_8B_HD_finetune_REMOH/

Merge the PVChat modifications:

# Copy and overwrite existing files, add new files
cp -rf PVChat/InternVideo2/multi_modality/Internvideo2_chat_8B_HD_PVChat/* \
       PVChat/InternVideo2/multi_modality/Internvideo2_chat_8B_HD_finetune_REMOH/

1.5 Dataset Expansion Commands

To expand your dataset with person-specific videos:

Place your source video files in:
```
Deepfacelab/data_src/
```

Update the paths in the script:

# Edit the script to update paths to your local directory
vim PVChat/InternVideo2/multi_modality/all_dataset_set_detail.sh

Run the dataset expansion script:

cd PVChat/InternVideo2/multi_modality/
bash all_dataset_set_detail.sh

1.6 Download the LAION-face-5B datsets

Refer this github https://github.com/rom1504/img2dataset

2. Fine-tuning Process

Configuration

Update the training script parameters:

Edit PVChat/InternVideo2/multi_modality/finetune_internvideo_REOMH_one_person2_stage.py with the following parameters:
- --sks_name: Name of the target person (e.g., "john_doe")
- --model_path: Path to the model checkpoint
- --train_json: Path to training JSON file generated from dataset expansion
- --short_train_json: Path to short training JSON file
- --test_json: Path to test JSON file
Update the configuration file:

Edit PVChat/InternVideo2/multi_modality/Internvideo2_chat_8B_HD_finetune_PVChat/config.json:
```
{
  ...
  "sks_name": "your_person_name",
  ...
 "model_config": {,
  "sks_name": "your_person_name",
  ...}
}
```
Note: There are two instances of sks_name in the config file. Both must be updated with the same person name.

Training

Run the fine-tuning script:

cd PVChat/InternVideo2/multi_modality/
python finetune_internvideo_REOMH_one_person2_stage.py \
    --sks_name "your_person_name" \
    --model_path "path/to/model" \
    --train_json "path/to/train.json" \
    --short_train_json "path/to/short_train.json" \
    --test_json "path/to/test.json"

Project Structure

PVChat/
├── PVChat/
│   ├── InternVideo2/
│   │   └── multi_modality/
│   │       ├── Internvideo2_chat_8B_HD_finetune_REMOH/
│   │       ├── Internvideo2_chat_8B_HD_PVChat/
│   │       ├── Internvideo2_chat_8B_HD_finetune_PVChat/
│   │       ├── all_dataset_set_detail.sh
│   │       └── finetune_internvideo_REOMH_one_person2_stage.py
│   ├── consisid/
│   ├── LivePortrait/
│   └── Deepfacelab/
│       └── data_src/
├── datasets/
│   └── celebv-hq/
├── environment/
│   ├── requirements_consisid_python_3.11.0.txt
│   ├── requirements_deepfacelab_python_3.7.16.txt
│   ├── requirements_face_quality_python_3.8.20.txt
│   ├── requirements_LivePortrait_python_3.10.6.txt
│   ├── requirements_photomaker_python_3.10.6.txt
│   ├── requirements_pvchat_python_3.10.0.txt
│   └── requirements_qwen_python_3.10.0.txt
└── README.md

Requirements

CUDA-compatible GPU with sufficient VRAM (recommended: 48GB+)
Conda package manager
Git
Sufficient disk space for datasets and model weights

Citation

If you find our paper and/or code helpful, please consider citing :

@InProceedings{Shi_2025_ICCV,
    author    = {Shi, Yufei and Yan, Weilong and Xu, Gang and Li, Yumeng and Chen, Yucheng and Li, Zhenxi and Yu, Fei and Li, Ming and Yeo, Si Yong},
    title     = {PVChat: Personalized Video Chat with One-Shot Learning},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2025},
    pages     = {23321-23331}
}

@article{pvchat,
      title={PVChat: Personalized Video Chat with One-Shot Learning}, 
      author={Yufei Shi and Weilong Yan and Gang Xu and Yumeng Li and Yucheng Chen and Zhenxi Li and Fei Richard Yu and Ming Li and Si Yong Yeo}, 
      year={2025},
      journal={arXiv preprint arXiv:2503.17069},
}

License

Please refer to the individual licenses of the incorporated projects.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PVChat: Personalized Video Chat with One-Shot Learning

To-Do List

Table of Contents

1. Dataset Expansion

1.1 Environment Setup

1. ConsisID Environment

2. DeepFaceLab Environment

3. Face Quality Environment

4. LivePortrait Environment

5. PhotoMaker Environment

6. PVChat Environment

7. Qwen Environment

8. Download our datasets

1.2 Code Configuration

ConsisID

LivePortrait

DeepFaceLab

1.3 CelebV-HQ Dataset

1.4 InternVideo2 Weights

1.5 Dataset Expansion Commands

1.6 Download the LAION-face-5B datsets

2. Fine-tuning Process

Configuration

Training

Project Structure

Requirements

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Deepfacelab		Deepfacelab
LivePortrait		LivePortrait
PVChat/InternVideo2/multi_modality		PVChat/InternVideo2/multi_modality
consisid		consisid
datasets/cekebv-hq		datasets/cekebv-hq
environment		environment
figures		figures
README.md		README.md
build_index.py		build_index.py
clip-retrieval.py		clip-retrieval.py
ref_video.zip		ref_video.zip

Folders and files

Latest commit

History

Repository files navigation

PVChat: Personalized Video Chat with One-Shot Learning

To-Do List

Table of Contents

1. Dataset Expansion

1.1 Environment Setup

1. ConsisID Environment

2. DeepFaceLab Environment

3. Face Quality Environment

4. LivePortrait Environment

5. PhotoMaker Environment

6. PVChat Environment

7. Qwen Environment

8. Download our datasets

1.2 Code Configuration

ConsisID

LivePortrait

DeepFaceLab

1.3 CelebV-HQ Dataset

1.4 InternVideo2 Weights

1.5 Dataset Expansion Commands

1.6 Download the LAION-face-5B datsets

2. Fine-tuning Process

Configuration

Training

Project Structure

Requirements

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages