skip to main content
research-article
Open access

Ranking the Transferability of Adversarial Examples

Published: 12 October 2024 Publication History

Abstract

Adversarial transferability in blackbox scenarios presents a unique challenge: while attackers can employ surrogate models to craft adversarial examples, they lack assurance on whether these examples will successfully compromise the target model. Until now, the prevalent method to ascertain success has been trial and error—testing crafted samples directly on the victim model. This approach, however, risks detection with every attempt, forcing attackers to either perfect their first try or face exposure.
Our article introduces a ranking strategy that refines the transfer attack process, enabling the attacker to estimate the likelihood of success without repeated trials on the victim’s system. By leveraging a set of diverse surrogate models, our method can predict transferability of adversarial examples. This strategy can be used to either select the best sample to use in an attack or the best perturbation to apply to a specific sample.
Using our strategy, we were able to raise the transferability of adversarial examples from a mere 20%—akin to random selection—up to near upper-bound levels, with some scenarios even witnessing a 100% success rate. This substantial improvement not only sheds light on the shared susceptibilities across diverse architectures but also demonstrates that attackers can forego the detectable trial-and-error tactics raising increasing the threat of surrogate-based attacks.

1 Introduction

Neural networks are vulnerable to adversarial examples in which adversaries aim to change the prediction of a model \(f\) on an input \(x\) in a covert manner [23]. The common form of this attack is where an adversarial example \(x^{\prime}=x+\delta\) is created such that \(f(x+\delta)\neq f(x)\) where \(||\delta||{\lt}\epsilon\). In other words, the adversarial example changes the model’s prediction yet the \(x^{\prime}\) appears the same as \(x\).
Many popular and powerful adversarial attacks (such as Projected Gradient Descent (PGD) [17] and Carlini-Wagner Attack [2]) are whitebox attacks. This means that to use these algorithms to generate \(x^{\prime}\), the attacker must have access to the learnt model parameters in \(f\) (i.e., the neural network’s weights). Although this may seem like a strong limitation for attackers, it has been shown that different neural networks can share the same vulnerabilities to an adversarial example [23, 25]. As such, an attacker can simply train a surrogate model \(f^{\prime}\) on a similar dataset [31], attack \(f^{\prime}\) to generate \(x^{\prime}\), and then deploy \(x^{\prime}\) on the blackbox model \(f\) knowing that there will be a decent probability of success. This attack is called a transfer attack [21]. This type of attack has been found to be effective across models trained on different subsets of the data [23], across domains [20], and even across tasks [29].
In the field of adversarial machine learning, understanding transferability is essential, especially when an attacker has only one or a few chances to succeed. In real-world situations, like trying to trick a facial recognition system at an airport or fooling a fraud detection algorithm at a bank, the attacker doesn’t have the luxury to keep trying until they succeed. If they fail, the system might lock them out or increase security, making it even harder to try again. Therefore, it’s important for an attacker to choose the adversarial example that is most likely to work on the first try. This is why ranking adversarial examples based on how well they are expected to transfer and fool the target model is crucial—it increases the odds of success in situations where there’s no room for error (see Figure 1).
Fig. 1.
Fig. 1. An illustration showing how ranking is beneficial to an adversary who is targeting model \(f\) in a black box setting. Here \(\mathscr{D}\) is the dataset available to the attacker and \(\mathscr{D}^{*}\) are adversarial examples. \(f\) is not available to the adversary. Adv, adversary.
In some attacks, the adversary may search for the best sample(s) to use and use ranking to select them (\(x_{i}\in D\) for \(x^{\prime}_{i}=x_{i}+\delta\)). However, in other attacks, the adversary is limited to use one or more specific samples (such as an image of an individual). In these attacks, an adversary could generate several adversarial examples for each given sample and then use ranking to select the best perturbation (\(\delta_{j}\) for \(x^{\prime}_{j}=x_{j}+\delta_{j}\)).
To understand these cases, let’s consider two example scenarios: In the first scenario, an attacker may be trying to evade detection of some anti-virus model \(f\) and can select one malware from a set for this purpose (represented as \(x_{i}\)) such that some modification to it (\(\delta\)) will make it perceived by \(f\) as benign software [10, 18]. Here, the ranking is done on \(x\in\mathscr{D}\), since the attacker can convert any of the malwares into an adversarial example, so the attacker will choose the malware that is best for transferability. In the second scenario, an attacker may want to tamper a specific patient’s medical image \(x\) with some perturbation \(\delta_{j}\) such that \(x^{\prime}_{j}\) will be falsely classified as containing some medical condition [12, 15]. Here, ranking is done on potential perturbations because there is only one \(x\). We note that in both cases, the attacker (1) has only one attempt to avoid being caught or (2) cannot get feedback from \(f\), but must select the sample \(x_{i}\) or perturbation \(\delta_{j}\) which will most likely transfer to the victim’s model \(f\).
To find the best \(x_{i}\) or \(\delta_{j}\) using surrogate \(f^{\prime}\), the attacker must rank potential adversarial examples accordingly to their expected attack success on \(f\). We call this measure the Expected Transferability (ET). To the best of our knowledge, there are no works which propose a means for ranking adversarial examples according to their ET. Current works, such as [7, 25, 32], determine if \(x^{\prime}\) transfers by directly evaluating it on the victim’s model \(f\). However, in a blackbox setting, an attacker cannot use \(f\) to measure success. Therefore, this approach can only be used as an upper bound, but cannot be used to (1) help the attacker select the best adversarial example(s) or (2) measure a model’s robustness to transferability attacks given the attacker’s limitations.
In this article, we explore the topic of ranking adversarial examples according to their ET. Our work offers several contributions: (1) we propose the concept of ET and define the ranking problem for adversarial examples, (2) we suggest a way to approximate the ET of an adversarial example and a heuristical way to increase the accuracy and practicality of the method, (3) we introduce a new metric (“transferability at \(k\)”) to measure attack performance considering an attacker’s best efforts, and (4) we frame the problem of transferability realistically in the perspective of a blackbox attacker: we propose the use of additional surrogates to evaluate transferability.

2 Definitions

In this section, we introduce the concept of ET, define the task of ranking an adversarial example’s transferability, and propose the metric “transferability at k.”

2.1 ET

In the domain of transferability, where the attacker is positioned in uncertainty, a deterministic answer as to whether an adversarial example will succeed (fool the victim) is impossible. The attacker is faced against an unknown victim model, which by definition the attacker has incomplete information about. An appropriate way to consider the victim model is as if it was sampled from the pool of all possible victim models. Therefore, rather than a guarantee, the attacker is interested in the expectation of each adversarial example to be successful.
To define formally the ET, the attacker is interested in we will need to define what is the pool of possible victim models. Let \(F\) be the set of all possible models scoped and based on the attacker’s knowledge of \(f\) (i.e., \(F\) is the set of all surrogate models that reflect \(f\)). In our setting, the attacker uses the surrogate model \(f^{\prime}\in F\) to create adversarial examples (denoted as the set \(\mathscr{D}^{*}\)).
We define the ET of an adversarial example \(x^{\prime}_{i}\in\mathscr{D}^{*}\) as the probability that \(x^{\prime}_{i}\) will successfully transfer to a random model in \(F\). A successful transfer of \(x^{\prime}_{i}\) to model \(f_{j}\in F\) can be defined as the case where \(f_{j}(x^{\prime}_{i})\neq y_{i}\) for an untargeted attack where \(y_{i}\) is the ground truth label of \(x_{i}\).1 Following the notation convention used for adversarial examples (e.g., [24]), the symbol \(\neq\) should be interpreted in the context of classification outcomes rather than as a strict boolean operation. Specifically, it indicates that the probability distribution output by \(f(x^{\prime})\) does not assign the highest probability to the ground truth class \(y\), meaning that the class with the highest probability in \(f(x^{\prime})\) is different from \(y\).
It can be said that the attacker’s goal is to select a sample \(x^{\prime}_{i}\) which has the highest probability to transfer to a random model drawn from the population \(F\). For untargeted attacks, we can measure \(x^{\prime}_{i}\)’s transferability with
\begin{align}S(x^{\prime}_{i})=\mathop{\mathbb{E}}_{f\sim F}[f(x^{\prime}_{i})\neq y_{i}].\end{align}
(1)
\(S\) can be used to rank adversarial examples because if \(S(x^{\prime}_{i}){\gt}S(x^{\prime}_{j})\), then \(x^{\prime}_{i}\) is more likely to transfer to a random model in \(F\) than \(x^{\prime}_{j}\). Note that (1) can similarly be defined for targeted attacks as well.

2.2 Transferability Ranking

Given \(S\), the attacker can sort the potential adversarial examples according to their ET. Therefore, we define the task of transferability ranking as the problem of obtaining an ordered set of adversarial examples \(\{x^{\prime}_{1},x^{\prime}_{2},...\}\) such that \(x^{\prime}_{i}\in\mathscr{D}^{*}\) and \(x^{\prime}_{i}\) is more likely to transfer than \(x^{\prime}_{j}\) if \(S(x^{\prime}_{i}){\gt}S(x^{\prime}_{j})\).
Note that when applying \(S(x^{\prime}_{i})\), it is possible to measure the ET rank of different samples from a dataset \(x^{\prime}_{i}=x_{i}+\delta,x_{i}\in\mathscr{D}\) or different perturbations on a specific same sample from the dataset \(x^{\prime}_{i}=x_{i}+\delta_{j},x\in\mathscr{D},\delta_{j}\in\delta\). Ranking adversarial examples by perturbation is relevant for attacks where multiple runs of the attack algorithm produce different perturbations [17].

2.3 Transferability at k

In a real-world attack, an attacker will curate a finite set of \(k\) adversarial examples on \(f^{\prime}\) to use against \(f\). To ensure success, it is critical that the attacker select the \(k\) samples that have the highest ET scores.
The top \(k\) samples of \(\mathscr{D^{*}}\) are denoted as the set \(S_{k}(\mathscr{D^{*}})\) where setting \(k=1\) is equivalent to selecting the sample that is the most likely to transfer.
Identifying the top \(k\) samples is not only useful for the attacker but also the defender. This is because a defender can evaluate his or her model’s robustness to attacks given the attacker’s best efforts (attacks using the top \(k\) samples). We call this performance measure the transferability at \(k\) defined as
\begin{align}T_{k}(\mathscr{D^{*}})=\frac{1}{k}\sum_{x^{\prime}\in S_{k}(\mathscr{D^{*}})} \left(f(x^{\prime})\neq y\right),\end{align}
(2)
which is the average number of cases where the top \(k\) samples selected by \(S\) successfully transferred to the victim \(f\).
This evaluation expresses a wide variety of use cases, some cases call for the use of only a small amount of adversarial samples, while other cases require large amounts so the attacker will only be interested in the small amount of worst adversarial attacks for transferability so they can be omitted. Note that the score of a specific \(k\) is bounded by success of the \(k\) most transferable samples in the dataset. As such, when \(K=|\mathscr{D^{*}}|\) the score will be the average attack success rate of the adversarial samples in the dataset for any ranking of the samples.

3 Implementation

In this section, we propose methods for implementing \(S\) and estimating the transferability at \(k\) without access to \(f\).

3.1 Approximate ET (AET)

Although the set \(F\) is potentially infinite, we can approximate it by sampling models from the population \(F_{0}\subset F\). With \(F_{0}\) we can approximate \(S\) by computing
\begin{align}S(x^{\prime}_{i})=\frac{1}{|F_{0}|}\sum_{j=1}^{|F_{0}|}\left(f_{j}(x^{\prime}_ {i})\neq y_{i}\right)\end{align}
(3)
for \(f_{j}\in F_{0}\).
In summary, we propose the use of multiple surrogate models to estimate ET: one surrogate model is used to generate the adversarial example (\(f^{\prime}\in F\)) and one or more surrogate models (\(F_{0}\subset F\)) are used to estimate the transferability of the adversarial example to \(f\).

3.2 Heuristical ET (HET)

Although we can use (3) to compute ET, the approach raises a technical challenge: it is impractical to train a significantly large set of surrogate models \(F_{0}\). For example, training a single Resnet-50 on ImageNet can take up to 4 days using common hardware [28]. However, if \(|F_{0}|\) is too small, then \(S\) will suffer from a lack of granularity. This is because, according to (3), each model reports a 0 or 1 if the attack fails or succeeds. To exemplify the issue of granularity, consider a case where \(|F_{0}|=10\) and we set \(k=100\). If \(\mathscr{D}^{*}\) contains 1,000 adversarial examples which fool all 10 models, then all 1,000 samples will receive a score of \(1.0\). However, the true \(S\) of these samples vary with respect to \(F\). As a result, we will be selecting \(k=100\) random samples randomly from these 1,000 which is not ideal.
To mitigate this issue, we propose using continuous values to capture attack success for \(x^{\prime}\) on each model. Specifically, for each model, we use the model’s confidence for the input sample’s ground-truth class. This value implicitly captures how successful \(x^{\prime}\) is at changing the model’s prediction since lower values indicate a higher likelihood that \(x^{\prime}\) will not be classified correctly [8, 17]. When averaged across \(|F_{0}|\) models, we can obtain a smoother probability which generalizes better to the population \(F\). Averaging model confidences is a popular ensemble technique used to join the prediction of multiple classifiers together [13]. However, here we use it to identify the degree in which a sample \(x^{\prime}\) exploits a set of models together.
To implement this heuristic approach, we modify (3) to
\begin{align}S(x^{\prime}_{i})=\frac{1}{|F_{0}|}\sum_{j=1}^{|F_{0}|}\left(1-\sigma_{y}\left (f_{j}(x^{\prime}_{i})\right)\right),f_{j}\in F_{0},\end{align}
(4)
where \(\sigma_{i}(y)\) returns the SoftMax value of the logit corresponding to the ground-truth label \(y\).
We demonstrate the benefit of using HET (4) over AET (3) with a simple experiment: We take a Resnet-50 architecture for both \(f\) and \(f^{\prime}\), trained on the same ImageNet train set [5]. Then, we create \(\mathscr{D}^{*}\) by attacking the ImageNet test set with PGD (\(\epsilon=\frac{1}{255}\)). Finally, we compute the AET and HET on each sample in \(\mathscr{D}^{*}\) with \(|F_{0}|=3\) surrogates.2 In Figure 2, we plot the attack success rate of \(\mathscr{D}^{*}\) on \(f\) for different \(k\) when sorting the samples according to AET and HET respectively. We observe that (1) although \(\mathscr{D}^{*}\) has a 98% attack success rate on \(f^{\prime}\), it only has a success rate of 20% on \(f\) even though both \(f\) and \(f^{\prime}\) are identical in design, and (2) HET performs better than AET, especially for lower \(k\) (i.e., when we select the top ranked samples).
Fig. 2.
Fig. 2. The ranking performance when using the HET score compared to using the AET score. Here, both methods use \(|F_{0}|=3\) .

3.3 Blackbox Ranking Strategies

As discussed earlier, it is more likely that an attacker will measure a sample’s transferability using surrogates and not the victim model \(f\) (as done in previous works). Below, we propose two strategies for ranking the transferability of a sample \(x^{\prime}\) without using \(f\) (illustrated in Figure 3):
Fig. 3.
Fig. 3. The proposed ranking strategy (HET) compared to the naive approach (without HET). Here, x is the input sample to be assigned a transferability score (the ET). The auxiliary surrogates in \(F_{0}\) reflect the victim’s model \(f\) depending on the attacker’s knowledge of \(f\) (architecture, training set, etc.)
Without ET.
This is the naive approach where the attacker uses one surrogate model (\(f^{\prime}\)) to select the adversarial examples. There are a few ways of doing this. For example, the attacker can check if \(x^{\prime}\) successfully fools \(f^{\prime}\) and then assume that it will also work on \(f\) because \(f\in F\). Another way is to evaluate the confidence of \(f^{\prime}\) (\(\sigma_{i}\)) on the clean sample \(x\) to identify an \(x\) which will be easy to attack [32, 33]. Although this strategy is efficient, it does not generalize well to \(F\). Even in a blackbox setting, where the attacker knows the victim’s architecture and training set, a sample \(x^{\prime}\) made on \(f^{\prime}\) will not necessarily work on \(f\). It was shown that even when the only difference is the model’s random initialization, predicting a specific sample’s transferability is still a challenging problem [14].
With HET.
In this strategy, the attacker utilizes multiple surrogate models (\(F_{0}\)) to approximate the ET of \(x^{\prime}\) to \(f\), as expressed in Equation (4). Here, the performance depends more on the attacker’s knowledge of \(f\) (the variability of \(F\)) but less so on the random artifacts caused by initialization of weights and the training data used.3 This is because the averaging mitigates cases where there are only a few outlier models in \(F_{0}\) which are vulnerable to \(x^{\prime}\). As a result, the final transferability score captures how well \(x^{\prime}\) transfers to vulnerabilities which are common among the models in \(F_{0}\). The concept of models having shared vulnerabilities has been shown in works such as [19, 26].

4 Experiment Setup

In this section, we present the experiments which we have performed to evaluate the proposed blackbox transferability ranking strategy.

4.1 Evaluation Measures

To evaluate our ranking methods, we use transferability at \(k\) as defined in Equation (2). Note that transferability at \(k\) can also be viewed as the attack success rate on \(f\) for the top-\(k\) recommended samples. We remind the reader that ranking is performed without access to \(f\) or knowledge of \(f\). Therefore, it can be said that the transferability at \(k\) measures the adversary’s attack success rate in a black box setting when only \(k\) attempts (attacks) are allowed.

4.2 Datasets

For our experiments, we used four datasets:
CIFAR10.
An image classification dataset which contains 60K images from 10 categories having a resolution of 32 \(\times\) 32.
ImageNet.
A popular image classification benchmark dataset containing about 1.2M images from 1,000 classes downsamples to a resolution of 224 \(\times\) 224.
RSNA-X-Ray.
A medical dataset, containing approximately 30K pneumonia chest X-ray images resized to a resolution of 224 \(\times\) 224. The dataset contains three classes and was originally published on Kaggle4 by the Radiological Society of North America.
Road Sign.
A dataset for traffic sign classification containing 3K images from 58 different up-sampled to a resolution of 224 \(\times\) 224. The dataset was originally published on Kaggle.5
For the ImageNet and CIFAR10 datasets, we have used the original data splits, whereas for the RSNA-X-Ray and Road\(\_\)Sign dataset, the images were split train, test, and validation sets with respective sizes of into 70%, 20% and 10%. The training sets were used to train \(f\) and \(f^{\prime}\) and the test sets were used to create the adversarial examples (\(\mathscr{D}^{*}\)). Since \(f(x)\neq y\) is counted as a successful attack, we must remove all samples from the test set where the clean sample is misclassified. This is done to avoid bias and focus our results strictly on samples that transfer as a result of the attack.

4.3 Architectures

In our experiments, we used five different architectures:
DenseNet-121.
Employs a dense connectivity pattern that connects each layer to every other layer in a feed-forward fashion. Its 121 layers are divided into dense blocks that ensure maximum information flow between layers, making it robust to perturbations.
Efficientnet.
Utilizes a compound scaling method that uniformly scales all dimensions of depth, width, and resolution with a fixed set of scaling coefficients. This architecture provides a balance between speed and accuracy, optimized to perform well even with limited computational resources. In CIFAR10 evaluations we use Efficientnet-b0 and for the reset we use Efficientnet-b2.
Resnet18.
Introduces skip connections to allow the flow of gradients through the network without attenuation. With 18 layers, it is relatively shallow, ensuring quick computations while still capturing complex features.
Vision Transformer (ViT).
The ViT applies the principles of transformer models, primarily used in natural language processing, to image classification tasks. It treats image patches as sequences, allowing for global receptive fields from the outset of the model. For evaluations conducted on CIFAR10, we use a ViT model with 7 layers, 12 heads, and an multi-layer perceptron layer dimension of 1152. For evaluations performed on other datasets, we used the ViT\(\_\)b\(\_\)16 architecture.
Swin Transformer (Swin\(\_\) s).
Implements a hierarchical transformer whose representations are computed with shifted windows, enabling efficient modeling of image data with varying scales and sizes.
In experiments conducted on X-Ray and Road\(\_\)Sign datasets, we fine-tuned pre-trained ImageNet models obtained from torchvision.6 Conversely, for CIFAR10, we trained models from scratch. In the context of the ImageNet experiments, we employed the same pre-trained models as in the X-Ray and Road\(\_\)Sign experiments, with the sole exception being experiments necessitating the evaluation of identical model architectures. In such cases, we utilized a larger variation of the same model also obtained from torchvision.

4.4 Threat Model

For our all of our experiments, we consider a blackbox adversary that has no knowledge of the victim’s architecture. To simulate this setting, we ensured that the architectures in used for \(f\), \(f^{\prime}\) and those in \(F_{0}\) were all unique. This simulates a black box setting because the architectures used by the adversary (surrogates \(f^{\prime}\) and \(F_{0}\)) will be different from the architecture used in the victim model \(f\).7

4.5 Attack Algorithms

For the attacks, we use Fast Gradient Sign Method (FGSM) [9], PGD [16] and PGD\(+\)Momentum (denoted as Momentum) [6], which should have increased transferability according to [30]. All of these algorithms are considered accepted baselines when evaluating adversarial attacks [3, 8, 24]. The FGSM attack performs a single optimization step on \(x\) to generate \(\delta\). The PGD and Momentum algorithms perform multiple iterations where each iteration normalizes \(\delta\) according to a given p-norm.
In our experiments, we only perform untargeted attacks (\(f(x^{\prime})\neq y\)), where the algorithm is executed on \(f^{\prime}\) alone (bounded by \(\epsilon=\frac{1}{255}\) for CIFAR10 and \(\epsilon=\frac{4}{255}\) ImageNet, X-Ray, and Road Sign datasets). In our experiments, we used only PGD unless explicitly stated otherwise.

4.6 Ranking Algorithms

We evaluate our three ranking strategies:
SoftMax.
In this implementation of the strategy, we score a sample’s transferability by taking \(1-\sigma_{i}(f^{\prime}(x))\) where \(x\) is the clean sample. The use of SoftMax here is inspired by the works of [33] where SoftMax is used to capture a model’s instability in \(f\) (not \(f^{\prime}\)).
SoftMax\(+\) Noise.
For this version, we follow the work of [32]. In their work, the authors found that samples that are sensitive to noise on the victim model \(f\) happen to transfer better from \(f^{\prime}\) to \(f\). We extend their work to the task of ranking: each clean sample in the test set is scored according to how much random noise impacts the confidence of the surrogate \(f^{\prime}\). Samples which are more sensitive are ranked higher. Similar to [32], we also use Gaussian noise and set std=\(\frac{16}{255}\).
HET (ours).
To implement HET, we use one surrogate model \(f^{\prime}\) to produce the adversarial examples and a set of three other unique surrogate models as \(F_{0}\) to rank them.
Ranking with these strategies is achieved by (1) computing the respective score on each adversarial example \(x^{\prime}\in\mathscr{D}^{*}\) and then (2) sorting the samples by their score (descending order).
As a baseline evaluation, we evaluate the average transferability rate across all the dataset (no ranking). This baseline can also be viewed as a kind of lower bound on performance. Note that this is essentially the same as the common transferability evaluation measure used in the literature.
Finally, we contrast the above ranking methods to the performance of the optimal solution (upper bound). In the task transferability ranking (blackbox), the optimal solution achieved by ordering the samples according to their performance on \(f\) (as opposed to using surrogates).

4.7 Environment and Reproducibility

Our code was written using Pytorch and all models were trained and executed on Nvidia 6000RTX GPUs. To reproduce our results, the reader can access our code online.8

4.8 Experiments

We investigate the following ranking tasks: (sample ranking) where the attacker must select the top \(k\) samples from \(\mathscr{D}\) to use in an attack on \(f\), and (perturbation ranking) where the attacker must select the best perturbation for a specific sample \(x\) in an attack on \(f\). In the sample ranking scenario we evaluate all value of \(k\) from \(k=1\) to \(k=|\mathscr{D}^{*}|\). For the perturbation ranking scenario, we set \(k\) to be 5%, 10%, and 20% of the respective dataset size.
We performed the following experiments to evaluate how well our ranking strategies perform and generalize to different settings:
E1 - Sample Ranking.
In these experiments, we explore the performance of the ranking strategies in the context of ranking samples. In other words, given a set of \(|\mathscr{D}|\) different images, if only \(k\) can be used in an attack, which \(k\) images should be selected to maximize likelihood of success (transferability).
E1.1 - Architectures.
The purpose of this experiment is to evaluate the generalization of the ranking strategies to different blackbox settings. In this experiment, we explore the transferability of the ranked samples for every combination of surrogate and victim model architecture and generate adversarial examples on \(f^{\prime}\) using PGD. For each dataset and combination of architectures, we evaluated the transferability at \(k\) of the strategies for every possible value of \(k\) (from \(k=1\) to \(k=|\mathscr{D}^{*}|\)).
E1.2 - Attacks.
The objective of this experiment is to see if the ranking strategies generalize to different attacks and whether there are some attacks that transfer better than others. For each dataset and combination of architectures, we evaluated the transferability at \(k\) of the strategies where \(k\) was set to 5%, 10%, and 20% of the respective dataset size.
E2 - Pertubation Ranking.
In this experiment, we evaluate how well the strategies can be used to rank perturbations instead of samples. This experiment captures the setting where the adversary must use a specific sample (i.e., image) in the attack but can improve the likelihood of transferability by selecting the best perturbation for that sample.
E2.1 - Performance.
The experiment was conducted as follows: for each sample \(x\), we generated 27 different perturbations using PGD with random starts and random values for alpha (between \(\frac{0.1}{255}\text{ and }\frac{0.3}{255}\)) and the number of iterations (between \(10\text{ and }20\)). We then ranked \(x\) with each of these perturbations and took the highest ranked sample (\(k=1\)). This process was repeated 100 times per dataset. In a follow-up experiment, we then ranked these images setting \(k\) to 5%, 10%, and 20% of the respective dataset size.

5 Experiment Results

5.1 E1—Sample Ranking

5.1.1 E1.1—Architectures.

We direct the reader’s attention to the results presented in Figures 4 and 5, which presents the performance of our proposed ranking strategy, HET, across various datasets and model architecture pairings. Figure 4 details the outcomes for CIFAR10 and ImageNet, while Figure 5 delves into the X-Ray and Road Sign datasets. These figures plot the transferability at \(k\) for a spectrum of \(k\) sizes, with the columns representing the victim architecture and the rows indicating the surrogate architecture utilized.
Fig. 4.
Fig. 4. E1.1 Results—The performance of ranking strategies for the CIFAR10 (top) and ImageNet (bottom) datasets. Each cell plots the transferability at \(k\) success rate for adversarial examples when ranked using different strategies across varied surrogate and victim model architectures. Columns represent the victim model architecture, and rows correspond to the surrogate model architecture.
Fig. 5.
Fig. 5. E1.1 Results—The performance of ranking strategies for the X-Ray (top) and Road Sign (bottom) datasets. Each cell plots the transferability at \(k\) success rate for adversarial examples when ranked using different strategies across varied surrogate and victim model architectures. Columns represent the victim model architecture, and rows correspond to the surrogate model architecture.
Our observations reveal a consistent trend across all datasets and architecture combinations: HET closely tracks the upper bound line, which represents the theoretical maximum transferability. Particularly for small values of \(k\), HET demonstrates a high likelihood of successful transferability, often achieving near-certain effectiveness. As \(k\) increases, HET maintains commendable performance, deviating by at most approximately 20.
Conversely, the baseline methods, namely SoftMax and SoftMax\(+\)Noise, exhibit subpar performance. Out of 100 architecture combinations, only 45 yield a ranking that can be considered beneficial, and this is limited to instances where \(k\) is minimal (\(k=1\)). Moreover, the inconsistent performance between the two baseline methods presents an additional challenge, as it is unpredictable which method will be successful in any given scenario.
We note that the attacker only achieves the ideal results when he or she happens to select the same architecture for \(f^{\prime}\) as the victim in \(f\) (captured by the diagonal of the figures). Under these circumstances, transferability is notably higher for all methods across all \(k\) values, although not perfect. The discrepancy, even when architectures match, can be attributed to different training seeds affecting the models’ decision boundaries, as discussed in [14]. Despite this, HET still significantly enhances transferability for nearly every combination of architectures (where the attacker guesses wrong). This is especially apparent in the setting where the attacker only needs to send one or just a few adversarial examples (low \(k\)) from the set of all potential images.
When dissecting performance relative to the datasets, HET exhibits robust results for ImageNet, CIFAR10, and X-Ray. Nonetheless, there are instances within the Road Sign dataset where HET does not perform optimally at lower \(k\) values but recovers effectiveness at higher \(k\)s. This could be attributed to the varying image sizes in the dataset, which, when resized to fit the model input, may adversely impact the feature representations.
In summary, our comprehensive evaluation underscores the efficacy of the HET ranking strategy in diverse blackbox settings, confirming its potential to improve adversarial example transferability in real-world attack scenarios.

5.1.2 E1.2—Attacks.

In this part of the evaluation, we present the findings of our comparative analysis on the transferability of three adversarial attacks: FGSM, PGD, and Momentum, conducted across four diverse datasets. These results offer insights into the effectiveness of these attacks, the vulnerability of different victim models, and the choice of different surrogate models for each attack method. the comparison is presented in Tables 1 and 2, partitioned such that results for the CIFAR10 and ImageNet datasets appear in Table 1, and results for the X-Ray and Road Sign datasets appear in Table 2. The lower and upper bounds presented in the tables are from the PGD attack.
Table 1.
SurrogateVictimLower \(B\)PGDMomentumFGSMUpper \(B\)
5%10%20%5%10%20%5%10%20%5%10%20%
ImageNetViT_b_16ViT_l_160.4510.9970.9870.9540.9950.9860.9461.0000.9970.9821.0001.0001.000
DenseNet1210.3940.9900.9790.9340.9900.9780.9260.9940.9900.9681.0001.0001.000
Efficientnet-b20.3440.9940.9710.9040.9940.9670.8930.9970.9890.9481.0001.0001.000
Resnet180.4420.9950.9830.9430.9950.9840.9370.9990.9920.9731.0001.0001.000
Swin_s0.2900.9340.8710.7800.9280.8610.7670.9540.9120.8441.0001.0001.000
DenseNet121ViT_b_160.2150.9710.9250.7920.9750.9280.7970.9750.9330.8181.0001.0001.000
DenseNet1610.3500.9930.9810.9320.9960.9840.9400.9960.9860.9471.0001.0001.000
Efficientnet-b20.2790.9890.9620.8730.9910.9670.8810.9890.9700.8931.0001.0001.000
Resnet180.4040.9920.9760.9260.9930.9790.9340.9950.9830.9421.0001.0001.000
Swin_s0.2020.9480.8790.7490.9490.8880.7560.9530.8980.7781.0001.0001.000
Efficientnet-b2ViT_b_160.2340.9770.9250.8060.9730.9240.8020.9850.9450.8581.0001.0001.000
DenseNet1210.3480.9910.9730.9080.9920.9720.9060.9950.9860.9441.0001.0001.000
Efficientnet_b10.3490.9960.9820.9330.9960.9840.9330.9990.9940.9681.0001.0001.000
Resnet180.3990.9930.9780.9260.9930.9780.9230.9970.9870.9561.0001.0001.000
Swin_s0.2260.9510.8890.7710.9520.8930.7680.9680.9190.8321.0001.0001.000
Resnet18ViT_b_160.2140.9740.9250.7910.9760.9260.7920.9790.9350.8111.0001.0001.000
DenseNet1210.3550.9900.9730.9140.9920.9770.9200.9940.9800.9271.0001.0001.000
Efficientnet-b20.2740.9880.9620.8690.9910.9650.8750.9870.9690.8881.0001.0001.000
Resnet340.3910.9960.9880.9460.9960.9900.9500.9970.9910.9571.0001.0001.000
Swin_s0.1960.9510.8800.7480.9510.8870.7490.9560.8890.7631.0001.0000.981
Swin_sViT_b_160.2510.9400.8820.7690.9370.8840.7690.9660.9320.8631.0001.0001.000
DenseNet1210.3490.9930.9750.9140.9920.9740.9130.9970.9900.9641.0001.0001.000
Efficientnet-b20.3080.9830.9580.8810.9850.9600.8830.9970.9840.9511.0001.0001.000
Resnet180.3970.9950.9810.9340.9960.9820.9340.9980.9920.9711.0001.0001.000
Swin_t0.6021.0000.9960.9841.0000.9980.9870.9991.0000.9971.0001.0001.000
CIFAR10ViTViT0.7690.9790.9680.9570.7200.5430.3370.7280.5620.3681.0001.0001.000
DenseNet1210.0770.7230.5360.3320.9810.9680.9560.9790.9620.9421.0000.7660.383
Efficientnet-b00.1360.8080.6920.4830.7950.6890.4940.8170.7170.5411.0001.0000.680
Resnet180.0650.6140.4490.2770.5760.4410.2770.6220.4700.2991.0000.6470.324
Swin_s0.2190.8310.7460.6240.8340.7510.6270.8540.7800.6721.0001.0001.000
DenseNet121ViT0.2900.7810.7210.6110.7870.7230.6090.8230.7750.7131.0001.0001.000
DenseNet1210.5100.9840.9620.9030.9880.9700.9190.9860.9640.9151.0001.0001.000
Efficientnet-b00.2830.9100.8130.6970.9100.8410.7240.9550.8960.8281.0001.0001.000
Resnet180.2860.8270.7100.6350.8490.7450.6690.8990.8160.7191.0001.0001.000
Swin_s0.1770.7650.6620.5370.7890.6710.5480.8680.8180.7031.0001.0000.883
Efficientnet-b0ViT0.2910.8080.7150.6030.8030.7250.6110.7960.7320.6291.0001.0001.000
DenseNet1210.3550.9540.8920.7990.9960.9920.9690.9870.9680.9391.0001.0001.000
Efficientnet-b00.6410.9940.9900.9710.9520.8900.8030.9190.8430.7091.0001.0001.000
Resnet180.3920.9360.8650.7880.9400.8590.7860.9120.8270.7151.0001.0001.000
Swin_s0.2080.8260.7280.6000.8270.7260.5970.8730.7680.6341.0001.0001.000
Resnet18ViT0.2710.7860.7290.6020.7880.7400.6000.8010.7780.6951.0001.0001.000
DenseNet1210.2040.7880.6850.5700.9100.8470.7170.9790.9090.8281.0001.0001.000
Efficientnet-b00.2390.8540.7730.6440.8180.7100.5930.9000.8150.7151.0001.0001.000
Resnet180.2510.8900.8120.6970.8520.7890.6660.9400.8830.8011.0001.0001.000
Swin_s0.1460.7450.6450.5050.7370.6530.5070.8150.7570.6431.0001.0000.732
Swin_sViT0.3900.8090.7340.6870.8150.7430.6800.8530.7840.7201.0001.0001.000
DenseNet1210.1550.8500.7330.5480.9880.9750.9270.9930.9830.9521.0001.0000.778
Efficientnet-b00.2530.9090.8370.7100.8740.7400.5500.9040.7770.6071.0001.0001.000
Resnet180.1210.7650.6080.4470.9160.8490.7060.9420.8820.7561.0001.0000.604
Swin_s0.5260.9880.9730.9270.7530.6080.4450.7950.6590.4931.0001.0001.000
Table 1. The Comparative Performance of HET Across Different Attack Algorithms and Architecture Combinations for the ImageNet and CIFAR10 Datasets
Columns categorize the various attack algorithms employed, while rows detail the architecture pairings, with surrogate models (\(F_{0}\)) distinct from the victim’s architecture. The color shading indicates better (green) and worse (red) results. Shading is done per surrogate.
Table 2.
SurrogateVictimLower \(B\)PGDMomentumFGSMUpper \(B\)
5%10%20%5%10%20%5%10%20%5%10%20%
X-RayViTViT0.4690.9760.9430.9000.9730.9440.8990.9720.9580.9181.0001.0001.000
DenseNet1210.2800.9310.8630.7460.9190.8680.7450.9410.8870.7701.0001.0001.000
Efficientnet-b20.2740.9550.8980.7740.9550.8890.7700.9470.9070.7971.0001.0001.000
Resnet180.2780.9430.8830.7610.9460.8780.7510.9570.9010.7811.0001.0001.000
Swin_s0.2840.9460.9010.7910.9430.8810.7790.9630.9180.8121.0001.0001.000
DenseNet121ViT0.2470.8850.8340.7070.8830.7920.6780.8950.8200.7171.0001.0001.000
DenseNet1210.8500.9890.9910.9900.9960.9950.9950.9920.9940.9941.0001.0001.000
Efficientnet-b20.4360.9330.8980.8520.9050.8900.8260.8910.8700.8031.0001.0001.000
Resnet180.4490.9810.9320.8780.9850.9560.9190.9710.9480.9131.0001.0001.000
Swin_s0.2870.9370.9020.8020.9520.9200.8160.9460.9100.8351.0001.0001.000
Efficientnet-b2ViT0.2220.8930.8000.6910.8980.8060.6800.9040.8560.7171.0001.0001.000
DenseNet1210.3100.9430.8970.8070.9520.9290.8500.9760.9440.8591.0001.0001.000
Efficientnet-b20.9921.0001.0001.0000.9900.9950.9850.9960.9920.9871.0001.0001.000
Resnet180.3240.9400.8970.8200.9520.9200.8490.9760.9340.8881.0001.0001.000
Swin_s0.2490.9330.8830.7740.9250.9010.8020.9400.9140.8191.0001.0001.000
Resnet18ViT0.2380.8920.8090.6770.8790.8000.6730.8960.8120.6961.0001.0001.000
DenseNet1210.4450.9560.9270.8670.9660.9580.9120.9890.9720.9181.0001.0001.000
Efficientnet-b20.4380.9590.9240.8750.9560.9030.8510.9590.9240.8631.0001.0001.000
Resnet180.9531.0001.0001.0001.0001.0001.0001.0000.9980.9971.0001.0001.000
Swin_s0.2820.9160.8900.7940.9290.9040.8030.9330.9110.8351.0001.0001.000
Swin_sViT0.2250.8980.7890.6450.8860.7960.6300.9090.8280.6681.0001.0001.000
DenseNet1210.3500.9610.9070.7950.9610.9150.8110.9730.9340.8551.0001.0001.000
Efficientnet-b20.3670.9760.9160.8360.9730.9360.8620.9850.9650.9091.0001.0001.000
Resnet180.3430.9640.9090.8010.9640.9180.8350.9730.9440.8821.0001.0001.000
Swin_s0.9671.0001.0000.9991.0001.0001.0001.0000.9980.9981.0001.0001.000
Road SignViTViT0.2070.5000.8000.5450.6670.8570.5000.0000.6670.8571.0001.0001.000
DenseNet1210.1210.5000.6000.4550.6670.5710.3570.0000.3330.7141.0001.0000.636
Efficientnet-b20.2241.0000.8000.8180.6670.8570.7141.0000.6670.8571.0001.0001.000
Resnet180.1211.0000.6000.6360.6670.5710.5001.0001.0000.5711.0001.0000.636
Swin_s0.2071.0001.0000.9090.6670.8570.6430.0000.6670.8571.0001.0001.000
DenseNet121ViT0.0170.2000.1000.0670.1880.0940.0470.3330.5000.2310.3330.1670.083
DenseNet1210.3630.7330.6330.4830.7500.6560.6090.6670.6670.8461.0001.0001.000
Efficientnet-b20.1100.6670.5000.3670.6880.5000.4220.6670.8330.6921.0001.0000.550
Resnet180.0370.6000.3000.1670.5630.3130.1721.0000.8330.6920.7330.3670.183
Swin_s0.0330.6000.3330.1670.5630.3130.1560.6670.6670.5380.6670.3330.167
Efficientnet-b2ViT0.0100.2000.0970.0480.2000.0970.0480.6000.2730.1300.2000.0970.048
DenseNet1210.0260.5330.2580.1290.4670.3230.1590.6000.6360.4350.5330.2580.129
Efficientnet-b20.7531.0000.9680.9351.0001.0000.9680.8000.9090.8701.0001.0001.000
Resnet180.0320.6000.3230.1610.6000.3550.1751.0000.7270.3910.6670.3230.161
Swin_s0.0320.5330.3230.1610.4670.2900.1430.6000.4550.3040.6670.3230.161
Resnet18ViT0.0130.1580.1050.0650.1500.0750.0620.1670.2500.1200.2630.1320.065
DenseNet1210.0260.4210.2370.1300.4000.2250.1230.5000.5830.3200.5260.2630.130
Efficientnet-b20.0900.6840.5260.3770.6500.4750.3701.0000.8330.6001.0000.9210.455
Resnet180.4040.8420.7890.6881.0000.9500.7530.8330.8330.6801.0001.0001.000
Swin_s0.0260.3680.2630.1300.4500.2500.1230.5000.3330.4000.5260.2630.130
Swin_sViT0.0140.1500.1220.0600.1300.1090.0540.1430.2000.1610.3000.1460.072
DenseNet1210.0220.4500.2200.1080.4350.2170.1090.8570.4670.3230.4500.2200.108
Efficientnet-b20.0720.6500.4390.3250.7390.5430.3590.7140.6670.5161.0000.7320.361
Resnet180.0270.4500.2680.1330.4350.2390.1200.7140.4670.3550.5500.2680.133
Swin_s0.2750.8000.6830.5900.8700.7170.5980.8570.7330.5481.0001.0001.000
Table 2. The Comparative Performance of HET Across Different Attack Algorithms and Architecture Combinations for the X-Ray and Road Sign Datasets
Columns categorize the various attack algorithms employed, while rows detail the architecture pairings, with surrogate models (\(F_{0}\)) distinct from the victim’s architecture. The color shading indicates better (green) and worse (red) results. Shading is done per surrogate.
Overall we found that the HET ranking method is highly effective regardless of the attack algorithm used. In two of the datasets (ImageNet and CIFAR10), all three attack achieved near upper bound performance with HET.
Interestingly, in our analysis, we observed a high degree of transferability exhibited by the FGSM attack. Across these datasets, FGSM consistently demonstrated its effectiveness in crafting adversarial examples capable of successfully deceiving a range of diverse victim models. The FGSM works by performing a single large perturbation on the image, unlike, the PGD and Momentum attacks which perform many small steps. Using many attack steps may be preferable in a white box scenario, since it allows targeting less prominent features in the victim model and thus creating less noticeable perturbations. Nevertheless, in the blackbox scenario, this might hinder the attack transferability, since these features might only be present in the surrogate model and not the victim.
Our experiments also shed light on the vulnerabilities of various victim models; among them, EfficentNet, a popular deep learning architecture, which was found to be the most susceptible victim model across all datasets. This discovery emphasizes the imperative need for robustness enhancements in EfficentNet and similar models to mitigate the risks posed by adversarial attacks. Conversely, ViT and DenseNet121 are proved to be reliant surrogate models, which provide better transferability capabilities compared to the others. In the case of ViT, this may be attributed to the transformer architecture that allows learning generic features, which do not necessarily reflect a certain architecture choice such as convolutions.
Furthermore, our study highlighted the critical role played by the dataset characteristics in determining the success of adversarial attacks. Notably, ImageNet, with its large images and extensive class diversity comprising 1,000 classes, emerged as the most vulnerable dataset for adversarial attacks. The complexity and diversity inherent in ImageNet make it an enticing target for attackers, as it offers more opportunities to craft adversarial examples that can effectively deceive a wide range of victim models. These findings underscore the necessity for heightened security measures, particularly in complex and diverse datasets like ImageNet, to safeguard against adversarial threats.

5.2 E2—Perturbation Ranking

5.2.1 E2.1—Performance.

In Figure 6, we present the average transferability success rate of a sample when the highest ranked perturbation (out of 25) is used. Although perturbation ranking is less effective than image ranking, the figure shows that transferability can indeed be improved modestly in many situations. We also note that the attacker receives the largest benefit from ranking if the surrogate happens to be the same architecture as the victim. However, these cases can be considered rare in a strict blackbox setting.
Fig. 6.
Fig. 6. The average attack success rate when using the best perturbation (\(k=1\)). The rows are the datasets and the columns are the architecture used for the surrogate \(f^{\prime}\). Bars indicate baseline results (no ranking).
We also observe that similar to the results in E1.1, the Road Sign dataset is challenging to perform ranking on. We believe this is because the Road Sign dataset is relatively small resulting in surrogate models with unaligned loss surfaces [4]. However, when the attacker has a large enough dataset (disjoint from the victim), then ranking is effective (e.g., for the case of ImageNet, CIFAR10, and X-Ray).
In Tables 3 and 4, we present the results when ranking images for different \(k\) after applying the best perturbation to each image. Table 3 presents the findings for the CIFAR10 and ImageNet datasets, while Table 4 provides the results for the X-Ray and Road Sign datasets. Each cell within these tables indicates the average transferability from 100 random images selected from the dataset, with the columns denoting the ranking methods alongside the established upper and lower bounds. The rows detail the combinations of architectures for the victim and surrogate models.
Table 3.
SurrogateVictimLower \(B\)SoftMaxSoftMax\(+\)NoiseHETUpper \(B\)
5%10%20%5%10%20%5%10%20%5%10%20%
ImageNetViT_b_16ViT_l_160.400.400.600.450.200.300.301.001.000.951.001.001.00
DenseNet1210.310.400.500.400.400.400.401.001.000.851.001.001.00
Efficientnet-b20.270.200.100.250.200.300.301.000.900.801.001.001.00
Resnet180.370.400.400.450.400.400.351.001.000.901.001.001.00
Swin_s0.220.400.500.350.200.300.250.800.800.651.001.000.99
DenseNet121ViT_b_160.230.200.300.300.200.200.301.000.800.651.001.000.99
DenseNet1610.380.600.400.450.400.500.501.001.001.001.001.001.00
Efficientnet-b20.280.200.200.350.200.300.351.001.000.801.001.001.00
Resnet180.470.400.400.450.200.300.401.000.900.901.001.001.00
Swin_s0.190.200.200.200.200.200.200.800.800.701.001.000.93
Efficientnet-b2ViT_b_160.260.200.400.350.000.200.201.001.000.801.001.001.00
DenseNet1210.390.000.400.400.000.200.351.001.000.951.001.001.00
Efficientnet-b10.450.400.500.500.400.400.401.001.000.951.001.001.00
Resnet180.450.000.400.450.200.400.500.800.900.901.001.001.00
Swin_s0.230.200.300.300.000.000.151.000.900.751.001.000.99
Resnet18ViT_b_160.250.200.300.350.200.400.301.001.000.751.001.000.99
DenseNet1210.390.400.600.500.200.400.301.000.900.901.001.001.00
Efficientnet-b20.280.400.400.300.400.500.351.000.900.751.001.001.00
Resnet340.380.200.400.350.200.400.351.001.000.901.001.001.00
Swin_s0.190.000.200.150.200.300.201.000.800.701.001.000.93
Swin_sViT_b_160.250.400.300.350.400.400.251.000.800.751.001.001.00
DenseNet1210.300.400.300.400.400.400.301.001.000.901.001.001.00
Efficientnet-b20.260.400.300.300.200.300.251.000.900.751.001.001.00
Resnet180.360.400.500.500.600.500.401.001.000.801.001.001.00
Swin_t0.600.800.900.800.800.800.751.001.001.001.001.001.00
CIFAR10ViTViT0.060.000.000.050.000.100.100.600.400.251.000.670.34
DenseNet1210.710.800.700.750.800.800.800.600.700.851.001.001.00
Efficientnet-b00.160.200.200.200.200.200.200.800.800.601.001.000.81
Resnet180.060.000.000.000.000.000.050.600.500.301.000.640.32
Swin_s0.180.200.200.250.200.400.250.400.700.601.001.000.90
DenseNet121ViT0.320.400.300.200.600.600.350.800.900.701.001.001.00
DenseNet1210.721.000.800.800.800.700.751.001.001.001.001.001.00
Efficientnet-b00.360.600.400.400.400.400.301.001.000.851.001.001.00
Resnet180.340.400.600.500.600.500.401.000.900.851.001.001.00
Swin_s0.180.400.300.200.400.300.251.000.700.601.001.000.84
Efficientnet-b0ViT0.340.200.200.250.200.200.301.000.900.701.001.001.00
DenseNet1210.651.000.900.801.000.800.751.001.000.901.001.001.00
Efficientnet-b00.320.200.100.150.400.300.301.001.000.751.001.001.00
Resnet180.400.400.400.450.800.600.401.000.900.851.001.001.00
Swin_s0.210.000.000.100.200.200.201.000.700.601.001.000.98
Resnet18ViT0.290.400.400.300.600.600.401.000.700.601.001.001.00
DenseNet1210.240.400.500.450.000.200.351.001.000.751.001.000.95
Efficientnet-b00.160.600.500.350.000.300.350.400.600.401.001.000.84
Resnet180.220.400.500.400.000.200.350.800.700.551.001.000.94
Swin_s0.140.400.300.200.200.100.200.600.600.451.001.000.75
Swin_sViT0.400.200.200.200.400.300.350.600.700.701.001.001.00
DenseNet1210.470.200.200.250.600.700.451.000.900.901.001.001.00
Efficientnet-b00.140.000.000.000.200.300.150.600.700.551.001.000.74
Resnet180.310.400.200.200.400.200.201.000.900.901.001.001.00
Swin_s0.110.000.000.050.200.100.051.000.500.551.000.990.57
Table 3. The Average Transferability of a Sample When Ranking its Potential Perturbations for the CIFAR10 and ImageNet Datasets Over 100 Trials
Columns represent the various ranking methods, and rows indicate the combination of victim and surrogate model architectures, ensuring that (\(F_{0}\)) is chosen from architectures different from the victim’s. The color shading indicates better (green) and worse (red) results. Shading is done per surrogate.
Table 4.
SurrogateVictimLower \(B\)SoftMaxSoftMaxHETUpper \(B\)
5%10%20%5%10%20%5%10%20%5%10%20%
X-RayViTViT0.610.800.700.650.800.700.701.001.001.001.001.001.00
DenseNet1210.340.400.300.350.200.200.251.000.800.751.001.001.00
Efficientnet-b20.530.600.700.550.600.600.550.800.700.751.001.001.00
Resnet180.360.600.500.450.400.400.401.001.000.901.001.001.00
Swin_s0.320.600.400.300.200.200.250.600.700.751.001.001.00
DenseNet121ViT0.210.200.200.250.400.400.450.800.600.451.001.000.99
DenseNet1210.830.800.800.800.600.800.701.001.001.001.001.001.00
Efficientnet-b20.550.400.400.400.600.800.551.000.900.651.001.001.00
Resnet180.390.200.400.450.600.500.451.000.800.651.001.001.00
Swin_s0.240.000.100.250.400.400.350.800.800.551.001.000.95
Efficientnet-b2ViT0.210.200.300.300.400.200.151.000.700.601.001.000.97
DenseNet1210.260.400.300.300.200.200.101.001.000.751.001.000.99
Efficientnet-b21.001.001.001.001.001.001.001.001.001.001.001.001.00
Resnet180.230.600.300.300.400.400.200.800.700.551.001.000.94
Swin_s0.180.400.300.350.200.200.150.800.600.501.001.000.88
Resnet18ViT0.210.000.100.200.200.200.301.000.700.401.001.000.99
DenseNet1210.550.800.400.450.400.500.601.001.000.901.001.001.00
Efficientnet-b20.590.600.700.600.400.500.451.000.800.651.001.001.00
Resnet180.991.001.001.001.001.001.001.001.001.001.001.001.00
Swin_s0.250.200.400.300.200.100.100.800.600.401.001.000.99
Swin_sViT0.240.400.300.250.600.400.400.800.600.501.001.001.00
DenseNet1210.460.400.300.500.800.600.550.800.900.801.001.001.00
Efficientnet-b20.530.200.600.600.800.600.600.800.700.651.001.001.00
Resnet180.360.800.600.600.600.400.300.800.700.551.001.001.00
Swin_s1.001.001.001.001.001.001.001.001.001.001.001.001.00
Road SignViTViT0.050.000.000.000.000.000.000.200.200.150.720.470.24
DenseNet1210.010.000.000.000.000.000.000.200.100.050.200.100.05
Efficientnet-b20.060.000.000.000.000.000.000.600.500.300.910.560.28
Resnet180.030.000.000.000.000.000.000.400.200.150.590.290.15
Swin_s0.030.000.000.000.000.000.000.600.300.150.610.300.15
DenseNet121ViT0.010.000.000.000.000.000.000.200.100.050.200.100.05
DenseNet1210.310.200.100.200.000.200.150.800.800.551.001.000.97
Efficientnet-b20.060.000.000.000.000.000.000.600.500.301.000.610.30
Resnet180.030.000.000.000.000.000.000.400.200.150.670.340.17
Swin_s0.020.000.000.000.000.000.000.400.200.100.440.220.11
Efficientnet-b2ViT0.010.000.000.000.000.000.000.200.100.050.200.100.05
DenseNet1210.010.000.000.000.000.000.000.200.100.050.200.100.05
Efficientnet-b20.730.600.800.750.600.800.851.000.900.901.001.001.00
Resnet180.020.000.000.000.000.000.000.200.100.100.590.300.15
Swin_s0.020.000.000.000.200.100.050.400.200.100.400.200.10
Resnet18ViT0.010.000.000.000.000.000.000.200.100.050.200.100.05
DenseNet1210.010.000.000.000.000.000.000.200.100.050.270.130.07
Efficientnet-b20.050.000.000.000.000.000.000.600.300.250.980.600.30
Resnet180.540.400.400.350.200.200.201.000.900.901.001.001.00
Swin_s0.020.000.000.000.000.000.000.400.200.100.400.200.10
Swin_sViT0.010.000.000.000.000.000.000.200.100.050.200.100.05
DenseNet1210.010.000.000.000.000.000.000.200.100.050.200.100.05
Efficientnet-b20.070.000.000.000.000.000.000.800.600.351.000.710.36
Resnet180.040.000.000.000.000.000.000.400.300.200.640.320.16
Swin_s0.530.000.100.250.200.200.201.000.800.801.001.000.99
Table 4. The Average Transferability of a Sample When Ranking Potential Perturbations for the X-Ray and Road Sign Datasets Over 100 Trials
Columns represent the various ranking methods, and rows indicate the combination of victim and surrogate model architectures, ensuring that (\(F_{0}\)) is chosen from architectures different from the victim’s. The color shading indicates better (green) and worse (red) results. Shading is done per surrogate.
The results underscore the proficiency of HET in consistently pinpointing the most transferable perturbation for a given sample. The significance of this capability is highlighted by the comparison to the lower bound of transferability—averaging at or below 30% across the datasets—which HET substantially elevates to an average of 70% or greater. Particularly striking is HET’s performance for the ImageNet, X-Ray, and Road Sign datasets, where it nearly mirrors the upper bound. In the X-Ray and ImageNet datasets, HET achieves perfect transferability across almost all blackbox scenarios (selection of architecture combinations).
Even as the value of \(k\) increases, representing a broader selection of top-ranking perturbations, HET maintains its superior performance. It exhibits an enhancement of up to 60% in transferability over the lower bound for larger values of \(k\). This improvement is noteworthy, demonstrating the robustness of HET in a variety of conditions.
Conversely, the baseline methods of SoftMax and SoftMax\(+\)Noise generally hover around the lower bound, occasionally achieving up to a 40% increase in transferability, yet still falling short of the performance attained by HET. The disparity between these methods and HET is particularly evident within the Road Sign dataset. For lower values of \(k\), the baselines are analogous to the lower bound, whereas HET’s results are akin to the upper bound, accentuating the substantial advantage provided by the HET ranking strategy in these scenarios.

6 Related Work

There has been a great amount of research done on adversarial transferability, discussing attacks [20, 22, 27, 32], defenses [11, 17] and performing general analysis of the phenomena [4, 14, 25]. However, these works do not rank transferability from the attacker’s side, but rather evaluate transferability of entire datasets directly on the victim model. In contrast, our work defines the task of transferability ranking for blackbox attackers and proposes methods for performing the task.
Some works have proposed techniques and measures for identifying samples that transfer better than others. For example, [33] show that some subsets of samples transfer better than others and suggest ways to curate evaluation datasets for transferability. They found that there is a connection between the transferability success of \(x^{\prime}\) to \(f\) when observing the SoftMax outputs of \(f(x)\). In an other work, sensitivity of \(x\) to Gaussian noise on \(f\) was found to be correlated with transferability of \(x^{\prime}\) to \(f\) [32]. However, in contrast to our work, the processes described in these works only apply to whitebox scenarios with access to \(f\). In many cases, attackers do not have access to the victim’s model, cannot send many adversarial examples to the model without raising suspicion, or cannot receive feedback from the model (e.g., classifiers used in airport X-ray machines [1]). Therefore, they cannot evaluate their attack on the victim’s model prior. Moreover, these works do not define the task of transferability ranking or suggest methods for measuring transferability such as transferability at \(k\) .
Finally, in contrast to prior works, we suggest a more grounded approach to evaluating model security in transfer attacks. We recommend that the community evaluate their models against the top \(k\) most transferable samples from a blackbox perspective, and not by taking the average success of all samples in whitebox perspective. This is because a true attacker will likely select the best samples to use with ultimately increases the performance (threat) of transferability attacks.

7 Conclusion

The results garnered from our extensive experimentation with the HET ranking strategy offer compelling evidence of its effectiveness in enhancing the transferability of adversarial examples. Notably, the strategy demonstrates remarkable efficacy in the context of improving the transferability of a single specific sample, a scenario of particular relevance to blackbox attackers. The findings reveal that attackers are not “stuck” with a resulting adversarial example and its likelihood of transferability; rather, by applying HET to select the best perturbation from a set of trials, attackers can significantly boost the likelihood of a successful transfer. Ideal results from transferability ranking can only be achieved when the attacker happens to guess the victim’s architecture and use it to generate the adversarial examples. However, our results show that if the attacker only needs to send less than 10% of a set of potential samples, then ranking increases the chance of transferability for those samples significantly, and nearly guarantees it when selecting only one (\(k=1\)).
This efficiency in attack methodology is a critical advantage in practical adversarial settings. It underscores a reduced risk for the attacker, as there is no longer a requirement to send to the victim a vast number of perturbations in the hope of stumbling upon a successful one. Instead, HET provides a systematic and predictive approach to identifying vulnerabilities in the victim without sending any samples to the victim first.
Moreover, the consistent performance of HET across various datasets and model architectures offers a deeper understanding of the intrinsic characteristics of adversarial attacks. As suggested in previous works, it supports the claim that that different models, despite their distinct architectural designs, often share common weaknesses [20, 22, 27, 32].
The ability to predict and exploit these shared vulnerabilities is significant. It implies that there is an underlying structure to the adversarial space that HET can effectively navigate. By selecting perturbations that are universally potent across models, HET enables attackers to reliably anticipate the success of their attacks in blackbox settings. This observation not only affirms the utility of HET but also provides a valuable insight into the nature of transferability, suggesting that the successful perturbation is not a fluke but a systematic exploitation of a model’s fundamental susceptibilities. The implications of this for both attackers and defenders are profound, as it calls for a deeper exploration into the robustness of models and the development of more sophisticated defense mechanisms.
Later on the Softmax (SM) without HET method is better and finally at very high k values the sensitivity method is better. This translates to the abilities of the SM without HET to detect the best and worst samples out of the set but less so the ranking among the samples in the middle. This implies that methods are not always consistent in their performance over different \(k\) values so it’s possible that an attacker may choose different methods for different situations. In the ImageNet, we see that the noise sensitivity ranking performed worse than the SM without HET in all of the \(k\) values.

Footnotes

1
In this work, we concentrate on untargeted attacks, though our methodology could be adapted for targeted attacks with the aim of achieving \(f_{j}(x^{\prime}_{i})=y_{t}\) where \(y_{t}\) represents the class that the attacker intends to mimic.
2
\(f\) is never included in \(F_{0}\) for all of our experiments.
3
This assumes that the training data for each model in \(F\) are drawn from the same distribution.
7
The only exception is where the victim uses the ViT architecture. In this case alone, we let the adversary use a different sized ViT in \(F_{0}\) due to time limitations in training.

References

[1]
Samet Akcay and Toby Breckon. 2022. Towards automatic threat detection: A survey of advances of deep learning within X-ray security imaging. Pattern Recognition 122, 108245.
[2]
Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In Proceedings of the IEEE Symposium on Security and Privacy (SP’17). IEEE, 39–57.
[3]
Francesco Croce and Matthias Hein. 2020. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In Proceedings of the International Conference on Machine Learning. PMLR, ICML, 2206–2216.
[4]
Ambra Demontis, Marco Melis, Maura Pintor, Matthew Jagielski, Battista Biggio, Alina Oprea, Cristina Nita-Rotaru, and Fabio Roli. 2019. Why do adversarial attacks transfer? explaining transferability of evasion and poisoning attacks. In Proceedings of the 28th USENIX Security Symposium (USENIX Security’19). 321–338.
[5]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 248–255.
[6]
Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, and Jianguo Li. 2018. Boosting adversarial attacks with momentum. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9185–9193.
[7]
Yinpeng Dong, Tianyu Pang, Hang Su, and Jun Zhu. 2019. Evading defenses to transferable adversarial examples by translation-invariant attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4312–4321.
[8]
Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and harnessing adversarial examples. In Proceedings of the International Conference on Learning Representations. Retrieved from http://arxiv.org/abs/1412.6572
[9]
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv:1412.6572. Retrieved from
[10]
Kathrin Grosse, Nicolas Papernot, Praveen Manoharan, Michael Backes, and Patrick McDaniel. 2017. Adversarial examples for malware detection. In Proceedings of the European Symposium on Research in Computer Security. Springer, 62–79.
[11]
Chuan Guo, Mayank Rana, Moustapha Cisse, and Laurens van der Maaten. 2018. Countering adversarial images using input transformations. In Proceedings of the International Conference on Learning Representations. arXiv preprint arXiv:1711.00117
[12]
Hokuto Hirano, Akinori Minagi, and Kazuhiro Takemoto. 2021. Universal adversarial attacks on deep neural networks for medical image classification. BMC Medical Imaging 21, 1 (2021), 1–13.
[13]
Cheng Ju, Aurélien Bibaut, and Mark van der Laan. 2018. The relative performance of ensemble methods with deep convolutional neural networks for image classification. Journal of Applied Statistics 45, 15 (2018), 2800–2818.
[14]
Ziv Katzir and Yuval Elovici. 2021. Who’s afraid of adversarial transferability? arXiv:2105.00433. Retrieved from
[15]
Moshe Levy, Guy Amit, Yuval Elovici, and Yisroel Mirsky. 2022. The security of deep learning defences for medical imaging. arXiv:2201.08661. Retrieved from
[16]
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2017. Towards deep learning models resistant to adversarial attacks. arXiv:1706.06083.
[17]
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards deep learning models resistant to adversarial attacks. In Proceedings of the International Conference on Learning Representations. arXiv preprint arXiv:1706.06083
[18]
Samaneh Mahdavifar and Ali A. Ghorbani. 2019. Application of deep learning to cybersecurity: A survey. Neurocomputing 347, 149–176.
[19]
Preetum Nakkiran. 2019. A discussion of ’adversarial examples are not bugs, they are features’: Adversarial examples are just bugs, too. In Distill. Retrieved from https://distill.pub/2019/advex-bugs-discussion/response-5
[20]
Muhammad Muzammal Naseer, Salman H. Khan, Muhammad Haris Khan, Fahad Shahbaz Khan, and Fatih Porikli. 2019. Cross-domain transferability of adversarial perturbations. Advances in Neural Information Processing Systems 32, 12905–12915.
[21]
Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. 2016. Transferability in machine learning: From phenomena to black-box attacks using adversarial samples. arXiv:1605.07277. Retrieved from
[22]
Jacob M. Springer, Melanie Mitchell, and Garrett T. Kenyon. 2021. A little robustness goes a long way: Leveraging robust features for targeted transfer attacks. In: M. Ranzato and A. Beygelzimer and Y. Dauphin and P.S. Liang and J. Wortman Vaughan (EDs.), Purchase Printed Proceeding Advances in Neural Information Processing Systems 34 (NeurIPS 2021). 9759–9773.
[23]
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2014. Intriguing properties of neural networks. In Proceedings of the 2nd International Conference on Learning Representations (ICLR’14). arXiv preprint arXiv:1312.6199
[24]
Florian Tramer, Nicholas Carlini, Wieland Brendel, and Aleksander Madry. 2020. On adaptive attacks to adversarial example defenses. Advances in Neural Information Processing Systems 33, 1633–1645.
[25]
Florian Tramr, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. 2017. The space of transferable adversarial examples. arXiv:1704.03453. Retrieved from
[26]
Renzhi Wang, Tianwei Zhang, Xiaofei Xie, Lei Ma, Cong Tian, Felix Juefei-Xu, and Yang Liu. 2020. Generating adversarial examples with controllable non-transferability. arXiv:2007.01299.
[27]
Xiaosen Wang, Xuanran He, Jingdong Wang, and Kun He. 2021. Admix: Enhancing the transferability of adversarial attacks. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 16158–16167.
[28]
Ross Wightman, Hugo Touvron, and Herve Jegou. 2021. ResNet strikes back: An improved training procedure in timm. In NeurIPS 2021 Workshop on ImageNet: Past, Present, and Future. Retrieved from https://openreview.net/forum?id=NG6MJnVl6M5
[29]
Cihang Xie, Jianyu Wang, Zhishuai Zhang, Yuyin Zhou, Lingxi Xie, and Alan Yuille. 2017. Adversarial examples for semantic segmentation and object detection. In Proceedings of the IEEE International Conference on Computer Vision. 1369–1378.
[30]
Cihang Xie, Zhishuai Zhang, Yuyin Zhou, Song Bai, Jianyu Wang, Zhou Ren, and Alan L. Yuille. 2019. Improving transferability of adversarial examples with input diversity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2730–2739.
[31]
Mingyi Zhou, Jing Wu, Yipeng Liu, Shuaicheng Liu, and Ce Zhu. 2020. Dast: Data-free substitute training for adversarial attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 234–243.
[32]
Yao Zhu, Jiacheng Sun, and Zhenguo Li. 2021. Rethinking adversarial transferability from a data distribution perspective. In Proceedings of the International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=NG6MJnVl6M5
[33]
Utku Özbulak, Esla Timothy Anzaku, Wesley De Neve, and Arnout Van Messem. 2021. Selection of source images heavily influences the effectiveness of adversarial attacks. In Proceedings of the 32nd British Machine Vision Conference (BMVC) (Online). 15. Retrieved from https://www.bmvc2021-virtualconference.com/programme/accepted-papers/

Index Terms

  1. Ranking the Transferability of Adversarial Examples

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Intelligent Systems and Technology
      ACM Transactions on Intelligent Systems and Technology  Volume 15, Issue 5
      October 2024
      719 pages
      EISSN:2157-6912
      DOI:10.1145/3613688
      • Editor:
      • Huan Liu
      Issue’s Table of Contents
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 12 October 2024
      Online AM: 05 June 2024
      Accepted: 23 April 2024
      Revised: 01 January 2024
      Received: 02 May 2023
      Published in TIST Volume 15, Issue 5

      Check for updates

      Author Tags

      1. Datasets
      2. neural networks
      3. gaze detection
      4. text tagging

      Qualifiers

      • Research-article

      Funding Sources

      • Zuckerman STEM Leadership

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 663
        Total Downloads
      • Downloads (Last 12 months)663
      • Downloads (Last 6 weeks)88
      Reflects downloads up to 12 Feb 2025

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Full Access

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media