Official PyTorch Implementation of ZIP: An Efficient Zeroth-order Prompt Tuning for Black-box Vision-Language Models (ICLR 2025)
Seonghwan Park, Jaehyeon Jeong, Yongjun Kim, Jaeho Lee, and Namhoon Lee
- Abstract
- Research Highlights
- Setup
- Data Preparation
- Running the Experiments
- Contact
- Citation
- Acknowledgements
Recent studies have introduced various approaches for prompt-tuning black-box vision-language models, referred to as black-box prompt-tuning (BBPT). While BBPT has demonstrated considerable potential, it is often found that many existing methods require an excessive number of queries (i.e., function evaluations), which poses a significant challenge in real-world scenarios where the number of allowed queries is limited. To tackle this issue, we propose
Zeroth-orderIntrinsic-dimensionalPrompt-tuning (ZIP), a novel approach that enables efficient and robust prompt optimization in a purely black-box setting. The key idea of ZIP is to reduce the problem dimensionality and the variance of zeroth-order gradient estimates, such that the training is done fast with far less queries. We achieve this by re-parameterizing prompts in low-rank representations and designing intrinsic-dimensional clipping of estimated gradients. We evaluate ZIP on 13+ vision-language tasks in standard benchmarks and show that it achieves an average improvement of approximately 6% in few-shot accuracy and 48% in query efficiency compared to the best-performing alternative BBPT methods, establishing a new state of the art. Our ablation analysis further shows that the proposed clipping mechanism is robust and nearly optimal, without the need to manually select the clipping threshold, matching the result of expensive hyperparameter search.
-
Prompt Reparameterization:
To address the challenge of dimensionality dependency (i.e., the number of queries scales with the dimensionality of the problem), we propose a novel low-rank representation. This approach reduces the dimensionality while effectively mitigating the loss of expressive power through feature sharing. -
Clipped Zeroth-order Optimization:
High variance in zeroth-order information can significantly degrade query efficiency. To tackle this, we propose a threshold-free gradient clipping method, termed intrinsic dimensional clipping. Inspired by prior studies on clipping thresholds, we set the clipping threshold to the square root of$\delta$ , which corresponds to the standard deviation of the zeroth-order gradient, where$\delta$ is the dimensionality of the problem. This approach not only reduces the variance of zeroth-order information but also achieves near-optimal performance without requiring manual tuning. -
Empirical Results:
We extensively validate ZIP across 13+ datasets, demonstrating its outstanding performance in few-shot adaptability and generalization on unseen distributions.
- Run the following commands to install Dassl.
Clone the ZIP repository from GitHub and navigate into the directory.
git clone https://github.com/LOG-postech/ZIP.git
cd ZIPCreate a new Conda environment named zip with Python version 3.9.12 and activate it.
conda create -y -n zip python=3.9.12
conda activate zipNote: If you require a different CUDA version, please refer to the PyTorch official website.
Install specific versions of torch, torchvision, and torchaudio using pip with the specified CUDA version.
pip install torch==1.13.0+cu117 torchvision==0.14.0+cu117 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu117Install all necessary dependencies listed in the requirements.txt file.
pip install -r requirements.txtDatasets from CoOp
Prepare the following 10 datasets by following the instructions in the CoOp dataset guide:
- Caltech101
- OxfordPets
- Flowers102
- Food101
- FGVCAircraft
- SUN397
- DTD
- EuroSAT
- UCF101
- ImageNet
Note: The same few-shot splits used in CoOp are applied for these datasets.
Datasets from BlackVIP
Follow the steps below to prepare the additional datasets:
-
Create Directory: Create a folder named
svhn/under$DATA. -
Download Dataset: Run the dataset download script.
-
Important: Replace the
DATA_PATHin line 52 of the script with your desired path.
-
Important: Replace the
-
Download Split File: Download split_mlai_SVHN.json and place it in
$DATA/svhn.
-
Create Directory: Create a folder named
resisc45/under$DATA. -
Download and Extract Dataset:
- Download NWPU-RESISC45.rar.
- Extract the contents into
$DATA/resisc45.
-
Download Split File: Download split_mlai_Resisc45.json and place it in
$DATA/resisc45.
-
Download and Extract Dataset:
- Download CLEVR_v1.0.zip.
- Extract it into
$DATA.
-
Download Split File: Download split_mlai_CLEVR.json and place it in
$DATA/CLEVR_v1.0.
- Navigate to the Directory: Go to the
ZIP/scripts/few_shotdirectory.
cd ZIP/scripts/few_shot- Run the Commands: Execute the desired method on the target dataset.
# Replace {METHOD} with the desired method and {DATASET} with the target dataset
sh {METHOD}.sh {DATASET}METHOD: Specify the method name, such asZIP.DATASET: Specify the dataset name, for example,imagenetorcaltech101.
Note: Valid names correspond to the filenames located in the ZIP/configs/ directory.
- Navigate to the Directory: Go to the
ZIP/scripts/base_to_newdirectory.
cd ZIP/scripts/base_to_new- Train the Model: Train the model using the specified method and dataset.
# Replace {METHOD} with the desired method and {DATASET} with the target dataset
sh train_{METHOD}.sh {DATASET}- Test the Model: Test the trained model using the specified method and dataset.
# Replace {METHOD} with the desired method and {DATASET} with the target dataset
sh test_{METHOD}.sh {DATASET}-
Train ImageNet as Source: Ensure ImageNet is trained as a source dataset before proceeding.
-
Navigate to the Directory: Go to the
ZIP/scripts/cross_dataset_transferdirectory.
cd ZIP/scripts/cross_dataset_transfer- Run the Commands: Execute the cross-dataset transfer testing using the desired method.
# Replace {METHOD} with the desired method
sh xd_test_{METHOD}.sh-
Train ImageNet as Source: Ensure ImageNet is trained as a source dataset before proceeding.
-
Navigate to the Directory: Go to the
ZIP/scripts/out_of_distributiondirectory.
cd ZIP/scripts/out_of_distribution- Run the Commands: Execute the out-of-distribution generalization testing using the desired method.
# Replace {METHOD} with the desired method
sh xd_test_im_{METHOD}.shFor any questions, discussions, or proposals, please contact:
📧 Email: shpark97@postech.ac.kr
If you use our code in your research, please consider citing:
@inproceedings{park2025zip,
title={ZIP: An Efficient Zeroth-order Prompt Tuning for Black-box Vision-Language Models},
author={Park, Seonghwan and Jeong, Jaehyeon and Kim, Yongjun and Lee, Jaeho and Lee, Namhoon},
booktitle={The Thirteenth International Conference on Learning Representations}
year={2025}
}Our experimental pipeline is built upon the following repositories:
For baseline construction, we referred to and borrowed code from these repositories:
We express our gratitude to Zhou et al., Mahabadi et al., Oh et al., Tsai et al., and Yu et al. for sharing their outstanding work and making their contributions available through open-source initiatives.