Skip to content

DavidYan2001/PVChat

Repository files navigation

PVChat: Personalized Video Chat with One-Shot Learning

ICCV 2025


arXiv HuggingFace


PVChat Architecture


This repository contains the implementation of PVChat, a personalized video chat model that extends InternVideo2 with person-specific fine-tuning capabilities.

To-Do List

  • Dataset Expansion
  • Dataset Organization
  • Dataset Expansion Code Optimization
  • Fine-tuning Code Optimization
  • HuggingFace Weights Update
  • Two people training process
  • three people training process

Table of Contents

1. Dataset Expansion

1.1 Environment Setup

This project requires 7 different conda environments for various components. All environment requirement files are located in the environment/ folder.

Create and configure each environment using the following commands:

1. ConsisID Environment

conda create -n consisid python=3.11.0
conda activate consisid
pip install -r environment/requirements_consisid_python_3.11.0.txt

2. DeepFaceLab Environment

conda create -n deepfacelab python=3.7.16
conda activate deepfacelab
pip install -r environment/requirements_deepfacelab_python_3.7.16.txt

3. Face Quality Environment

conda create -n face_quality python=3.8.20
conda activate face_quality
pip install -r environment/requirements_face_quality_python_3.8.20.txt

4. LivePortrait Environment

conda create -n LivePortrait python=3.10.6
conda activate LivePortrait
pip install -r environment/requirements_LivePortrait_python_3.10.6.txt

5. PhotoMaker Environment

conda create -n photomaker python=3.10.6
conda activate photomaker
pip install -r environment/requirements_photomaker_python_3.10.6.txt

6. PVChat Environment

conda create -n pvchat python=3.10.0
conda activate pvchat
pip install -r environment/requirements_pvchat_python_3.10.0.txt

7. Qwen Environment

conda create -n qwen python=3.10.0
conda activate qwen
pip install -r environment/requirements_qwen_python_3.10.0.txt

8. Download our datasets

gdown https://drive.google.com/file/d/1pr-oegxyhtLEr6Z0euEa3v4aGvm79UUZ/view?usp=sharing

1.2 Code Configuration

Clone and configure the following repositories:

ConsisID

git clone https://github.com/PKU-YuanGroup/ConsisID.git ConsisID_temp

LivePortrait

git clone https://github.com/KwaiVGI/LivePortrait.git LivePortrait_temp

DeepFaceLab

git clone https://github.com/iperov/DeepFaceLab.git DeepFaceLab_temp

After cloning and configuring each repository (including downloading required weights from HuggingFace), merge the provided code:

# For ConsisID
cp -rf consisid/* ConsisID_temp/
# Move the merged folder to the appropriate location if needed

# For LivePortrait
cp -rf LivePortrait/* LivePortrait_temp/
# Move the merged folder to the appropriate location if needed

# For DeepFaceLab
cp -rf Deepfacelab/* DeepFaceLab_temp/
# Move the merged folder to the appropriate location if needed

Note: The cp -rf command will overwrite files with the same name and copy new files that don't exist in the destination.

1.3 CelebV-HQ Dataset

Download the CelebV-HQ dataset following the instructions from the official repository:

  1. Visit: https://github.com/CelebV-HQ/CelebV-HQ
  2. Follow the download instructions provided in the repository
  3. Place the downloaded dataset in:
    datasets/celebv-hq/
    

1.4 InternVideo2 Weights

  1. Download the original InternVideo2 weights and place them in:

    PVChat/InternVideo2/multi_modality/Internvideo2_chat_8B_HD_finetune_REMOH/
    
  2. Merge the PVChat modifications:

    # Copy and overwrite existing files, add new files
    cp -rf PVChat/InternVideo2/multi_modality/Internvideo2_chat_8B_HD_PVChat/* \
           PVChat/InternVideo2/multi_modality/Internvideo2_chat_8B_HD_finetune_REMOH/

1.5 Dataset Expansion Commands

To expand your dataset with person-specific videos:

  1. Place your source video files in:

    Deepfacelab/data_src/
    
  2. Update the paths in the script:

    # Edit the script to update paths to your local directory
    vim PVChat/InternVideo2/multi_modality/all_dataset_set_detail.sh
  3. Run the dataset expansion script:

    cd PVChat/InternVideo2/multi_modality/
    bash all_dataset_set_detail.sh

1.6 Download the LAION-face-5B datsets

Refer this github https://github.com/rom1504/img2dataset

2. Fine-tuning Process

Configuration

  1. Update the training script parameters:

    Edit PVChat/InternVideo2/multi_modality/finetune_internvideo_REOMH_one_person2_stage.py with the following parameters:

    • --sks_name: Name of the target person (e.g., "john_doe")
    • --model_path: Path to the model checkpoint
    • --train_json: Path to training JSON file generated from dataset expansion
    • --short_train_json: Path to short training JSON file
    • --test_json: Path to test JSON file
  2. Update the configuration file:

    Edit PVChat/InternVideo2/multi_modality/Internvideo2_chat_8B_HD_finetune_PVChat/config.json:

    {
      ...
      "sks_name": "your_person_name",
      ...
     "model_config": {,
      "sks_name": "your_person_name",
      ...}
    }

    Note: There are two instances of sks_name in the config file. Both must be updated with the same person name.

Training

Run the fine-tuning script:

cd PVChat/InternVideo2/multi_modality/
python finetune_internvideo_REOMH_one_person2_stage.py \
    --sks_name "your_person_name" \
    --model_path "path/to/model" \
    --train_json "path/to/train.json" \
    --short_train_json "path/to/short_train.json" \
    --test_json "path/to/test.json"

Project Structure

PVChat/
├── PVChat/
│   ├── InternVideo2/
│   │   └── multi_modality/
│   │       ├── Internvideo2_chat_8B_HD_finetune_REMOH/
│   │       ├── Internvideo2_chat_8B_HD_PVChat/
│   │       ├── Internvideo2_chat_8B_HD_finetune_PVChat/
│   │       ├── all_dataset_set_detail.sh
│   │       └── finetune_internvideo_REOMH_one_person2_stage.py
│   ├── consisid/
│   ├── LivePortrait/
│   └── Deepfacelab/
│       └── data_src/
├── datasets/
│   └── celebv-hq/
├── environment/
│   ├── requirements_consisid_python_3.11.0.txt
│   ├── requirements_deepfacelab_python_3.7.16.txt
│   ├── requirements_face_quality_python_3.8.20.txt
│   ├── requirements_LivePortrait_python_3.10.6.txt
│   ├── requirements_photomaker_python_3.10.6.txt
│   ├── requirements_pvchat_python_3.10.0.txt
│   └── requirements_qwen_python_3.10.0.txt
└── README.md

Requirements

  • CUDA-compatible GPU with sufficient VRAM (recommended: 48GB+)
  • Conda package manager
  • Git
  • Sufficient disk space for datasets and model weights

Citation

If you find our paper and/or code helpful, please consider citing :

@InProceedings{Shi_2025_ICCV,
    author    = {Shi, Yufei and Yan, Weilong and Xu, Gang and Li, Yumeng and Chen, Yucheng and Li, Zhenxi and Yu, Fei and Li, Ming and Yeo, Si Yong},
    title     = {PVChat: Personalized Video Chat with One-Shot Learning},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2025},
    pages     = {23321-23331}
}

@article{pvchat,
      title={PVChat: Personalized Video Chat with One-Shot Learning}, 
      author={Yufei Shi and Weilong Yan and Gang Xu and Yumeng Li and Yucheng Chen and Zhenxi Li and Fei Richard Yu and Ming Li and Si Yong Yeo}, 
      year={2025},
      journal={arXiv preprint arXiv:2503.17069},
}

License

Please refer to the individual licenses of the incorporated projects.

About

[ICCV 2025] PVChat: Personalized Video Chat with One-Shot Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors