The used datasets are placed in data folder with the following structure.
data
|_ vidstg
| |_ videos
| | |_ [video name 0].mp4
| | |_ [video name 1].mp4
| | |_ ...
| |_ vstg_annos
| | |_ train.json
| | |_ ...
| |_ sent_annos
| | |_ train_annotations.json
| | |_ ...
| |_ data_cache
| | |_ ...
|_ hc-stvg2
| |_ v2_video
| | |_ [video name 0].mp4
| | |_ [video name 1].mp4
| | |_ ...
| |_ annos
| | |_ hcstvg_v2
| | | |_ train.json
| | | |_ test.json
| | data_cache
| | |_ ...
|_ hc-stvg
| |_ v1_video
| | |_ [video name 0].mp4
| | |_ [video name 1].mp4
| | |_ ...
| |_ annos
| | |_ hcstvg_v1
| | | |_ train.json
| | | |_ test.json
| | data_cache
| | |_ ...
The download link for the above-mentioned document is as follows:
hc-stvg: v1_video, annos, data_cache
hc-stvg2: v2_video, annos, data_cache
vidstg: videos, vstg_annos, sent_annos, data_cache
The used datasets are placed in model_zoo folder
ResNet-101, VidSwin-T, roberta-base
The code has been tested and verified using PyTorch 2.0.1 and CUDA 11.7. However, compatibility with other versions is also likely. To install the necessary requirements, please use the commands provided below:
pip3 install -r requirements.txt
apt install ffmpeg -yPlease utilize the script provided below:
# run for HC-STVG
python3 -m torch.distributed.launch \
--nproc_per_node=8 \
scripts/train_net.py \
--config-file "experiments/hcstvg.yaml" \
INPUT.RESOLUTION 420 \
OUTPUT_DIR output/hcstvg \
TENSORBOARD_DIR output/hcstvg
# run for HC-STVG2
python3 -m torch.distributed.launch \
--nproc_per_node=8 \
scripts/train_net.py \
--config-file "experiments/hcstvg2.yaml" \
INPUT.RESOLUTION 420 \
OUTPUT_DIR output/hcstvg2 \
TENSORBOARD_DIR output/hcstvg2
# run for VidSTG
python3 -m torch.distributed.launch \
--nproc_per_node=8 \
scripts/train_net.py \
--config-file "experiments/vidstg.yaml" \
INPUT.RESOLUTION 420 \
OUTPUT_DIR output/vidstg \
TENSORBOARD_DIR output/vidstgFor additional training options, such as utilizing different hyper-parameters, please adjust the configurations as needed:
experiments/hcstvg.yaml, experiments/hcstvg2.yaml and experiments/vidstg.yaml.
Please utilize the script provided below:
# run for HC-STVG
python3 -m torch.distributed.launch \
--nproc_per_node=8 \
scripts/test_net.py \
--config-file "experiments/hcstvg.yaml" \
INPUT.RESOLUTION 420 \
MODEL.WEIGHT [Pretrained Model Weights] \
OUTPUT_DIR output/hcstvg
# run for HC-STVG2
python3 -m torch.distributed.launch \
--nproc_per_node=8 \
scripts/test_net.py \
--config-file "experiments/hcstvg2.yaml" \
INPUT.RESOLUTION 420 \
MODEL.WEIGHT [Pretrained Model Weights] \
OUTPUT_DIR output/hcstvg2
# run for VidSTG
python3 -m torch.distributed.launch \
--nproc_per_node=8 \
scripts/test_net.py \
--config-file "experiments/vidstg.yaml" \
INPUT.RESOLUTION 420 \
MODEL.WEIGHT [Pretrained Model Weights] \
OUTPUT_DIR output/vidstgWe provide our trained checkpoints for results reproducibility.
🎏 CG-STVG achieves state-of-the-art performance on three challenging benchmarks, including HCSTVG-v1, HCSTVG-v2, and VidSTG, as shown below. Note that, the baseline is our CG-STVG without context generation and refinement.
This repo is partly based on the open-source release from STCAT and the evaluation metric implementation is borrowed from TubeDETR for a fair comparison.