ZIP: An Efficient Zeroth-order Prompt Tuning for Black-box Vision-Language Models

Official PyTorch Implementation of ZIP: An Efficient Zeroth-order Prompt Tuning for Black-box Vision-Language Models (ICLR 2025)
Seonghwan Park, Jaehyeon Jeong, Yongjun Kim, Jaeho Lee, and Namhoon Lee

Abstract

Recent studies have introduced various approaches for prompt-tuning black-box vision-language models, referred to as black-box prompt-tuning (BBPT). While BBPT has demonstrated considerable potential, it is often found that many existing methods require an excessive number of queries (i.e., function evaluations), which poses a significant challenge in real-world scenarios where the number of allowed queries is limited. To tackle this issue, we propose Zeroth-order Intrinsic-dimensional Prompt-tuning (ZIP), a novel approach that enables efficient and robust prompt optimization in a purely black-box setting. The key idea of ZIP is to reduce the problem dimensionality and the variance of zeroth-order gradient estimates, such that the training is done fast with far less queries. We achieve this by re-parameterizing prompts in low-rank representations and designing intrinsic-dimensional clipping of estimated gradients. We evaluate ZIP on 13+ vision-language tasks in standard benchmarks and show that it achieves an average improvement of approximately 6% in few-shot accuracy and 48% in query efficiency compared to the best-performing alternative BBPT methods, establishing a new state of the art. Our ablation analysis further shows that the proposed clipping mechanism is robust and nearly optimal, without the need to manually select the clipping threshold, matching the result of expensive hyperparameter search.

Research Highlights

Key Contributions

Prompt Reparameterization:
To address the challenge of dimensionality dependency (i.e., the number of queries scales with the dimensionality of the problem), we propose a novel low-rank representation. This approach reduces the dimensionality while effectively mitigating the loss of expressive power through feature sharing.
Clipped Zeroth-order Optimization:
High variance in zeroth-order information can significantly degrade query efficiency. To tackle this, we propose a threshold-free gradient clipping method, termed intrinsic dimensional clipping. Inspired by prior studies on clipping thresholds, we set the clipping threshold to the square root of $\delta$, which corresponds to the standard deviation of the zeroth-order gradient, where $\delta$ is the dimensionality of the problem. This approach not only reduces the variance of zeroth-order information but also achieves near-optimal performance without requiring manual tuning.
Empirical Results:
We extensively validate ZIP across 13+ datasets, demonstrating its outstanding performance in few-shot adaptability and generalization on unseen distributions.

Setup

Run the following commands to install Dassl.
- Dassl.pytorch.

Clone the Repository

Clone the ZIP repository from GitHub and navigate into the directory.

git clone https://github.com/LOG-postech/ZIP.git
cd ZIP

Create and Activate Conda Environment

Create a new Conda environment named zip with Python version 3.9.12 and activate it.

conda create -y -n zip python=3.9.12
conda activate zip

Install PyTorch and TorchVision

Note: If you require a different CUDA version, please refer to the PyTorch official website.

Install specific versions of torch, torchvision, and torchaudio using pip with the specified CUDA version.

pip install torch==1.13.0+cu117 torchvision==0.14.0+cu117 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu117

Install Dependencies

Install all necessary dependencies listed in the requirements.txt file.

pip install -r requirements.txt

Data Preparation

Datasets from CoOp

Prepare the following 10 datasets by following the instructions in the CoOp dataset guide:

Caltech101
OxfordPets
Flowers102
Food101
FGVCAircraft
SUN397
DTD
EuroSAT
UCF101
ImageNet

Note: The same few-shot splits used in CoOp are applied for these datasets.

Datasets from BlackVIP

Follow the steps below to prepare the additional datasets:

SVHN

Create Directory: Create a folder named svhn/ under $DATA.
Download Dataset: Run the dataset download script.
- Important: Replace the DATA_PATH in line 52 of the script with your desired path.
Download Split File: Download split_mlai_SVHN.json and place it in $DATA/svhn.

Resisc45

Create Directory: Create a folder named resisc45/ under $DATA.
Download and Extract Dataset:
- Download NWPU-RESISC45.rar.
- Extract the contents into $DATA/resisc45.
Download Split File: Download split_mlai_Resisc45.json and place it in $DATA/resisc45.

CLEVR

Download and Extract Dataset:
- Download CLEVR_v1.0.zip.
- Extract it into $DATA.
Download Split File: Download split_mlai_CLEVR.json and place it in $DATA/CLEVR_v1.0.

Running the Experiments

Few-Shot Learning Benchmarks

Navigate to the Directory: Go to the ZIP/scripts/few_shot directory.

cd ZIP/scripts/few_shot

Run the Commands: Execute the desired method on the target dataset.

# Replace {METHOD} with the desired method and {DATASET} with the target dataset
sh {METHOD}.sh {DATASET}

METHOD: Specify the method name, such as ZIP.
DATASET: Specify the dataset name, for example, imagenet or caltech101.

Note: Valid names correspond to the filenames located in the ZIP/configs/ directory.

Base-to-New Generalization Benchmarks

Navigate to the Directory: Go to the ZIP/scripts/base_to_new directory.

cd ZIP/scripts/base_to_new

Train the Model: Train the model using the specified method and dataset.

# Replace {METHOD} with the desired method and {DATASET} with the target dataset
sh train_{METHOD}.sh {DATASET}

Test the Model: Test the trained model using the specified method and dataset.

# Replace {METHOD} with the desired method and {DATASET} with the target dataset
sh test_{METHOD}.sh {DATASET}

Cross Dataset Transfer Benchmarks

Train ImageNet as Source: Ensure ImageNet is trained as a source dataset before proceeding.
Navigate to the Directory: Go to the ZIP/scripts/cross_dataset_transfer directory.

cd ZIP/scripts/cross_dataset_transfer

Run the Commands: Execute the cross-dataset transfer testing using the desired method.

# Replace {METHOD} with the desired method
sh xd_test_{METHOD}.sh

Out-of-Distribution Generalization Benchmarks

Train ImageNet as Source: Ensure ImageNet is trained as a source dataset before proceeding.
Navigate to the Directory: Go to the ZIP/scripts/out_of_distribution directory.

cd ZIP/scripts/out_of_distribution

Run the Commands: Execute the out-of-distribution generalization testing using the desired method.

# Replace {METHOD} with the desired method
sh xd_test_im_{METHOD}.sh

Contact

For any questions, discussions, or proposals, please contact:

📧 Email: shpark97@postech.ac.kr

Citation

If you use our code in your research, please consider citing:

@inproceedings{park2025zip,
  title={ZIP: An Efficient Zeroth-order Prompt Tuning for Black-box Vision-Language Models},
  author={Park, Seonghwan and Jeong, Jaehyeon and Kim, Yongjun and Lee, Jaeho and Lee, Namhoon},
  booktitle={The Thirteenth International Conference on Learning Representations}
  year={2025}
}

Acknowledgements

Our experimental pipeline is built upon the following repositories:

For baseline construction, we referred to and borrowed code from these repositories:

We express our gratitude to Zhou et al., Mahabadi et al., Oh et al., Tsai et al., and Yu et al. for sharing their outstanding work and making their contributions available through open-source initiatives.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
clip		clip
configs		configs
datasets		datasets
es_algorithms		es_algorithms
imgs		imgs
projections		projections
scripts		scripts
trainers		trainers
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ZIP: An Efficient Zeroth-order Prompt Tuning for Black-box Vision-Language Models

Table of Contents

Abstract

Research Highlights

Key Contributions

Setup

Clone the Repository

Create and Activate Conda Environment

Install PyTorch and TorchVision

Install Dependencies

Data Preparation

Datasets from CoOp

Datasets from BlackVIP

SVHN

Resisc45

CLEVR

Running the Experiments

Few-Shot Learning Benchmarks

Base-to-New Generalization Benchmarks

Cross Dataset Transfer Benchmarks

Out-of-Distribution Generalization Benchmarks

Contact

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

LOG-postech/ZIP

Folders and files

Latest commit

History

Repository files navigation

ZIP: An Efficient Zeroth-order Prompt Tuning for Black-box Vision-Language Models

Table of Contents

Abstract

Research Highlights

Key Contributions

Setup

Clone the Repository

Create and Activate Conda Environment

Install PyTorch and TorchVision

Install Dependencies

Data Preparation

Datasets from CoOp

Datasets from BlackVIP

SVHN

Resisc45

CLEVR

Running the Experiments

Few-Shot Learning Benchmarks

Base-to-New Generalization Benchmarks

Cross Dataset Transfer Benchmarks

Out-of-Distribution Generalization Benchmarks

Contact

Citation

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages