X-TFCLIP: An Extended TF-CLIP Method Adapted to Aerial-Ground Person ReIdentification

X-TFCLIP is the winner of the AG-VPReID 2025: The 2nd Aerial-Ground Person ReID competition .It is an extended version of the TF-CLIP framework that leverages temporal and visual-language pretraining (CLIP) for video based aerial-ground person re-identification.

X-TFCLIP improves over TF-CLIP across all metrics in the AG-VPReID based challenge dataset:

Method	Aerial→Ground R1	R5	R10	mAP	Ground→Aerial R1	R5	R10	mAP	Overall R1	R5	R10	mAP
X-TFCLIP	72.28	81.94	88.81	74.45	70.77	82.59	86.08	72.67	71.56	82.25	85.94	73.60
TF-CLIP	63.08	75.16	79.89	65.52	64.49	79.86	83.97	67.07	63.75	77.40	81.83	66.26

Environment Setup

# Create and activate environment
conda create -n xtfclip python=3.12.9
conda activate xtfclip

# Install PyTorch with CUDA 11.3
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126

# Install additional dependencies
pip install yacs timm scikit-image tqdm ftfy regex

Training

To train the model, run:

python train_main.py --output_dir "logs/all"

Evaluation

The repository supports two cross-view matching scenarios:

Aerial-to-Ground matching:

python eval_main.py --custom_output_dir "results/case1_aerial_to_ground" --output_dir "logs/all"

Ground-to-Aerial matching:

python eval_main.py --custom_output_dir "results/case2_ground_to_aerial" --output_dir "logs/all"

Note: For case 2, you need to modify the dataset path in datasets/set/agreidvid.py to point to case2_ground_to_aerial for query and gallery.

Paper

X-TFCLIP achieved the 1st place in the AG-VPReID 2025: The 2nd Aerial-Ground Person ReID Challenge.

Please consider citing the following article if you found this work helpful.

@misc{nguyen2025agvpreid2025aerialgroundvideobased,
      title={AG-VPReID 2025: Aerial-Ground Video-based Person Re-identification Challenge Results}, 
      author={Kien Nguyen and Clinton Fookes and Sridha Sridharan and Huy Nguyen and Feng Liu and Xiaoming Liu and Arun Ross and Dana Michalski and Tamás Endrei and Ivan DeAndres-Tame and Ruben Tolosana and Ruben Vera-Rodriguez and Aythami Morales and Julian Fierrez and Javier Ortega-Garcia and Zijing Gong and Yuhao Wang and Xuehu Liu and Pingping Zhang and Md Rashidunnabi and Hugo Proença and Kailash A. Hambarde and Saeid Rezaei},
      year={2025},
      eprint={2506.22843},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2506.22843}, 
}

✨ Extra features compared to TF-CLIP

🔍 Bicubic CLIP-VIT positional embedding resizing
🧠 Lightweight Attention Pooling
🧭 Online Label Smooth Loss
🎯 Video Frame Positional Embeddings
⚙️ Learnable Clip Memory Weighing
💬 Instance Norm Based BNN-Neck
🔧 Soft-Biometric Based Distance Matrix Masking

Please refer to the original GitHub repo for additional code implementations on which this method is based on.

Acknowledgement

This baseline is based on the work of TF-CLIP. We appreciate the authors for their excellent contribution.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
config		config
configs		configs
datasets		datasets
imgs		imgs
loss		loss
model		model
processor		processor
solver		solver
utils		utils
README.md		README.md
eval.py		eval.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

X-TFCLIP: An Extended TF-CLIP Method Adapted to Aerial-Ground Person ReIdentification

Environment Setup

Training

Evaluation

Paper

✨ Extra features compared to TF-CLIP

Acknowledgement

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

BiDAlab/X-TFCLIP

Folders and files

Latest commit

History

Repository files navigation

X-TFCLIP: An Extended TF-CLIP Method Adapted to Aerial-Ground Person ReIdentification

Environment Setup

Training

Evaluation

Paper

✨ Extra features compared to TF-CLIP

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages