research-article

Open access

Ranking the Transferability of Adversarial Examples

Authors:

Moshe Levy,

Guy Amit,

Yuval Elovici,

Yisroel MirskyAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology, Volume 15, Issue 5

Article No.: 100, Pages 1 - 21

https://doi.org/10.1145/3670409

Published: 12 October 2024 Publication History

PDF eReader

Abstract

Adversarial transferability in blackbox scenarios presents a unique challenge: while attackers can employ surrogate models to craft adversarial examples, they lack assurance on whether these examples will successfully compromise the target model. Until now, the prevalent method to ascertain success has been trial and error—testing crafted samples directly on the victim model. This approach, however, risks detection with every attempt, forcing attackers to either perfect their first try or face exposure.

Our article introduces a ranking strategy that refines the transfer attack process, enabling the attacker to estimate the likelihood of success without repeated trials on the victim’s system. By leveraging a set of diverse surrogate models, our method can predict transferability of adversarial examples. This strategy can be used to either select the best sample to use in an attack or the best perturbation to apply to a specific sample.

Using our strategy, we were able to raise the transferability of adversarial examples from a mere 20%—akin to random selection—up to near upper-bound levels, with some scenarios even witnessing a 100% success rate. This substantial improvement not only sheds light on the shared susceptibilities across diverse architectures but also demonstrates that attackers can forego the detectable trial-and-error tactics raising increasing the threat of surrogate-based attacks.

1 Introduction

Neural networks are vulnerable to adversarial examples in which adversaries aim to change the prediction of a model \(f\) on an input \(x\) in a covert manner [23]. The common form of this attack is where an adversarial example \(x^{\prime}=x+\delta\) is created such that \(f(x+\delta)\neq f(x)\) where \(||\delta||{\lt}\epsilon\). In other words, the adversarial example changes the model’s prediction yet the \(x^{\prime}\) appears the same as \(x\).

Many popular and powerful adversarial attacks (such as Projected Gradient Descent (PGD) [17] and Carlini-Wagner Attack [2]) are whitebox attacks. This means that to use these algorithms to generate \(x^{\prime}\), the attacker must have access to the learnt model parameters in \(f\) (i.e., the neural network’s weights). Although this may seem like a strong limitation for attackers, it has been shown that different neural networks can share the same vulnerabilities to an adversarial example [23, 25]. As such, an attacker can simply train a surrogate model \(f^{\prime}\) on a similar dataset [31], attack \(f^{\prime}\) to generate \(x^{\prime}\), and then deploy \(x^{\prime}\) on the blackbox model \(f\) knowing that there will be a decent probability of success. This attack is called a transfer attack [21]. This type of attack has been found to be effective across models trained on different subsets of the data [23], across domains [20], and even across tasks [29].

In the field of adversarial machine learning, understanding transferability is essential, especially when an attacker has only one or a few chances to succeed. In real-world situations, like trying to trick a facial recognition system at an airport or fooling a fraud detection algorithm at a bank, the attacker doesn’t have the luxury to keep trying until they succeed. If they fail, the system might lock them out or increase security, making it even harder to try again. Therefore, it’s important for an attacker to choose the adversarial example that is most likely to work on the first try. This is why ranking adversarial examples based on how well they are expected to transfer and fool the target model is crucial—it increases the odds of success in situations where there’s no room for error (see Figure 1).

Fig. 1.

In some attacks, the adversary may search for the best sample(s) to use and use ranking to select them (\(x_{i}\in D\) for \(x^{\prime}_{i}=x_{i}+\delta\)). However, in other attacks, the adversary is limited to use one or more specific samples (such as an image of an individual). In these attacks, an adversary could generate several adversarial examples for each given sample and then use ranking to select the best perturbation (\(\delta_{j}\) for \(x^{\prime}_{j}=x_{j}+\delta_{j}\)).

To understand these cases, let’s consider two example scenarios: In the first scenario, an attacker may be trying to evade detection of some anti-virus model \(f\) and can select one malware from a set for this purpose (represented as \(x_{i}\)) such that some modification to it (\(\delta\)) will make it perceived by \(f\) as benign software [10, 18]. Here, the ranking is done on \(x\in\mathscr{D}\), since the attacker can convert any of the malwares into an adversarial example, so the attacker will choose the malware that is best for transferability. In the second scenario, an attacker may want to tamper a specific patient’s medical image \(x\) with some perturbation \(\delta_{j}\) such that \(x^{\prime}_{j}\) will be falsely classified as containing some medical condition [12, 15]. Here, ranking is done on potential perturbations because there is only one \(x\). We note that in both cases, the attacker (1) has only one attempt to avoid being caught or (2) cannot get feedback from \(f\), but must select the sample \(x_{i}\) or perturbation \(\delta_{j}\) which will most likely transfer to the victim’s model \(f\).

To find the best \(x_{i}\) or \(\delta_{j}\) using surrogate \(f^{\prime}\), the attacker must rank potential adversarial examples accordingly to their expected attack success on \(f\). We call this measure the Expected Transferability (ET). To the best of our knowledge, there are no works which propose a means for ranking adversarial examples according to their ET. Current works, such as [7, 25, 32], determine if \(x^{\prime}\) transfers by directly evaluating it on the victim’s model \(f\). However, in a blackbox setting, an attacker cannot use \(f\) to measure success. Therefore, this approach can only be used as an upper bound, but cannot be used to (1) help the attacker select the best adversarial example(s) or (2) measure a model’s robustness to transferability attacks given the attacker’s limitations.

In this article, we explore the topic of ranking adversarial examples according to their ET. Our work offers several contributions: (1) we propose the concept of ET and define the ranking problem for adversarial examples, (2) we suggest a way to approximate the ET of an adversarial example and a heuristical way to increase the accuracy and practicality of the method, (3) we introduce a new metric (“transferability at \(k\)”) to measure attack performance considering an attacker’s best efforts, and (4) we frame the problem of transferability realistically in the perspective of a blackbox attacker: we propose the use of additional surrogates to evaluate transferability.

2 Definitions

In this section, we introduce the concept of ET, define the task of ranking an adversarial example’s transferability, and propose the metric “transferability at k.”

2.1 ET

In the domain of transferability, where the attacker is positioned in uncertainty, a deterministic answer as to whether an adversarial example will succeed (fool the victim) is impossible. The attacker is faced against an unknown victim model, which by definition the attacker has incomplete information about. An appropriate way to consider the victim model is as if it was sampled from the pool of all possible victim models. Therefore, rather than a guarantee, the attacker is interested in the expectation of each adversarial example to be successful.

To define formally the ET, the attacker is interested in we will need to define what is the pool of possible victim models. Let \(F\) be the set of all possible models scoped and based on the attacker’s knowledge of \(f\) (i.e., \(F\) is the set of all surrogate models that reflect \(f\)). In our setting, the attacker uses the surrogate model \(f^{\prime}\in F\) to create adversarial examples (denoted as the set \(\mathscr{D}^{*}\)).

We define the ET of an adversarial example \(x^{\prime}_{i}\in\mathscr{D}^{*}\) as the probability that \(x^{\prime}_{i}\) will successfully transfer to a random model in \(F\). A successful transfer of \(x^{\prime}_{i}\) to model \(f_{j}\in F\) can be defined as the case where \(f_{j}(x^{\prime}_{i})\neq y_{i}\) for an untargeted attack where \(y_{i}\) is the ground truth label of \(x_{i}\).¹ Following the notation convention used for adversarial examples (e.g., [24]), the symbol \(\neq\) should be interpreted in the context of classification outcomes rather than as a strict boolean operation. Specifically, it indicates that the probability distribution output by \(f(x^{\prime})\) does not assign the highest probability to the ground truth class \(y\), meaning that the class with the highest probability in \(f(x^{\prime})\) is different from \(y\).

It can be said that the attacker’s goal is to select a sample \(x^{\prime}_{i}\) which has the highest probability to transfer to a random model drawn from the population \(F\). For untargeted attacks, we can measure \(x^{\prime}_{i}\)’s transferability with

\begin{align}S(x^{\prime}_{i})=\mathop{\mathbb{E}}_{f\sim F}[f(x^{\prime}_{i})\neq y_{i}].\end{align}

(1)

\(S\) can be used to rank adversarial examples because if \(S(x^{\prime}_{i}){\gt}S(x^{\prime}_{j})\), then \(x^{\prime}_{i}\) is more likely to transfer to a random model in \(F\) than \(x^{\prime}_{j}\). Note that (1) can similarly be defined for targeted attacks as well.

2.2 Transferability Ranking

Given \(S\), the attacker can sort the potential adversarial examples according to their ET. Therefore, we define the task of transferability ranking as the problem of obtaining an ordered set of adversarial examples \(\{x^{\prime}_{1},x^{\prime}_{2},...\}\) such that \(x^{\prime}_{i}\in\mathscr{D}^{*}\) and \(x^{\prime}_{i}\) is more likely to transfer than \(x^{\prime}_{j}\) if \(S(x^{\prime}_{i}){\gt}S(x^{\prime}_{j})\).

Note that when applying \(S(x^{\prime}_{i})\), it is possible to measure the ET rank of different samples from a dataset \(x^{\prime}_{i}=x_{i}+\delta,x_{i}\in\mathscr{D}\) or different perturbations on a specific same sample from the dataset \(x^{\prime}_{i}=x_{i}+\delta_{j},x\in\mathscr{D},\delta_{j}\in\delta\). Ranking adversarial examples by perturbation is relevant for attacks where multiple runs of the attack algorithm produce different perturbations [17].

2.3 Transferability at k

In a real-world attack, an attacker will curate a finite set of \(k\) adversarial examples on \(f^{\prime}\) to use against \(f\). To ensure success, it is critical that the attacker select the \(k\) samples that have the highest ET scores.

The top \(k\) samples of \(\mathscr{D^{*}}\) are denoted as the set \(S_{k}(\mathscr{D^{*}})\) where setting \(k=1\) is equivalent to selecting the sample that is the most likely to transfer.

Identifying the top \(k\) samples is not only useful for the attacker but also the defender. This is because a defender can evaluate his or her model’s robustness to attacks given the attacker’s best efforts (attacks using the top \(k\) samples). We call this performance measure the transferability at \(k\) defined as

\begin{align}T_{k}(\mathscr{D^{*}})=\frac{1}{k}\sum_{x^{\prime}\in S_{k}(\mathscr{D^{*}})} \left(f(x^{\prime})\neq y\right),\end{align}

(2)

which is the average number of cases where the top \(k\) samples selected by \(S\) successfully transferred to the victim \(f\).

This evaluation expresses a wide variety of use cases, some cases call for the use of only a small amount of adversarial samples, while other cases require large amounts so the attacker will only be interested in the small amount of worst adversarial attacks for transferability so they can be omitted. Note that the score of a specific \(k\) is bounded by success of the \(k\) most transferable samples in the dataset. As such, when \(K=|\mathscr{D^{*}}|\) the score will be the average attack success rate of the adversarial samples in the dataset for any ranking of the samples.

3 Implementation

In this section, we propose methods for implementing \(S\) and estimating the transferability at \(k\) without access to \(f\).

3.1 Approximate ET (AET)

Although the set \(F\) is potentially infinite, we can approximate it by sampling models from the population \(F_{0}\subset F\). With \(F_{0}\) we can approximate \(S\) by computing

\begin{align}S(x^{\prime}_{i})=\frac{1}{|F_{0}|}\sum_{j=1}^{|F_{0}|}\left(f_{j}(x^{\prime}_ {i})\neq y_{i}\right)\end{align}

(3)

for \(f_{j}\in F_{0}\).

In summary, we propose the use of multiple surrogate models to estimate ET: one surrogate model is used to generate the adversarial example (\(f^{\prime}\in F\)) and one or more surrogate models (\(F_{0}\subset F\)) are used to estimate the transferability of the adversarial example to \(f\).

3.2 Heuristical ET (HET)

Although we can use (3) to compute ET, the approach raises a technical challenge: it is impractical to train a significantly large set of surrogate models \(F_{0}\). For example, training a single Resnet-50 on ImageNet can take up to 4 days using common hardware [28]. However, if \(|F_{0}|\) is too small, then \(S\) will suffer from a lack of granularity. This is because, according to (3), each model reports a 0 or 1 if the attack fails or succeeds. To exemplify the issue of granularity, consider a case where \(|F_{0}|=10\) and we set \(k=100\). If \(\mathscr{D}^{*}\) contains 1,000 adversarial examples which fool all 10 models, then all 1,000 samples will receive a score of \(1.0\). However, the true \(S\) of these samples vary with respect to \(F\). As a result, we will be selecting \(k=100\) random samples randomly from these 1,000 which is not ideal.

To mitigate this issue, we propose using continuous values to capture attack success for \(x^{\prime}\) on each model. Specifically, for each model, we use the model’s confidence for the input sample’s ground-truth class. This value implicitly captures how successful \(x^{\prime}\) is at changing the model’s prediction since lower values indicate a higher likelihood that \(x^{\prime}\) will not be classified correctly [8, 17]. When averaged across \(|F_{0}|\) models, we can obtain a smoother probability which generalizes better to the population \(F\). Averaging model confidences is a popular ensemble technique used to join the prediction of multiple classifiers together [13]. However, here we use it to identify the degree in which a sample \(x^{\prime}\) exploits a set of models together.

To implement this heuristic approach, we modify (3) to

\begin{align}S(x^{\prime}_{i})=\frac{1}{|F_{0}|}\sum_{j=1}^{|F_{0}|}\left(1-\sigma_{y}\left (f_{j}(x^{\prime}_{i})\right)\right),f_{j}\in F_{0},\end{align}

(4)

where \(\sigma_{i}(y)\) returns the SoftMax value of the logit corresponding to the ground-truth label \(y\).

We demonstrate the benefit of using HET (4) over AET (3) with a simple experiment: We take a Resnet-50 architecture for both \(f\) and \(f^{\prime}\), trained on the same ImageNet train set [5]. Then, we create \(\mathscr{D}^{*}\) by attacking the ImageNet test set with PGD (\(\epsilon=\frac{1}{255}\)). Finally, we compute the AET and HET on each sample in \(\mathscr{D}^{*}\) with \(|F_{0}|=3\) surrogates.² In Figure 2, we plot the attack success rate of \(\mathscr{D}^{*}\) on \(f\) for different \(k\) when sorting the samples according to AET and HET respectively. We observe that (1) although \(\mathscr{D}^{*}\) has a 98% attack success rate on \(f^{\prime}\), it only has a success rate of 20% on \(f\) even though both \(f\) and \(f^{\prime}\) are identical in design, and (2) HET performs better than AET, especially for lower \(k\) (i.e., when we select the top ranked samples).

Fig. 2.

3.3 Blackbox Ranking Strategies

As discussed earlier, it is more likely that an attacker will measure a sample’s transferability using surrogates and not the victim model \(f\) (as done in previous works). Below, we propose two strategies for ranking the transferability of a sample \(x^{\prime}\) without using \(f\) (illustrated in Figure 3):

Fig. 3.

Without ET.

This is the naive approach where the attacker uses one surrogate model (\(f^{\prime}\)) to select the adversarial examples. There are a few ways of doing this. For example, the attacker can check if \(x^{\prime}\) successfully fools \(f^{\prime}\) and then assume that it will also work on \(f\) because \(f\in F\). Another way is to evaluate the confidence of \(f^{\prime}\) (\(\sigma_{i}\)) on the clean sample \(x\) to identify an \(x\) which will be easy to attack [32, 33]. Although this strategy is efficient, it does not generalize well to \(F\). Even in a blackbox setting, where the attacker knows the victim’s architecture and training set, a sample \(x^{\prime}\) made on \(f^{\prime}\) will not necessarily work on \(f\). It was shown that even when the only difference is the model’s random initialization, predicting a specific sample’s transferability is still a challenging problem [14].

With HET.

In this strategy, the attacker utilizes multiple surrogate models (\(F_{0}\)) to approximate the ET of \(x^{\prime}\) to \(f\), as expressed in Equation (4). Here, the performance depends more on the attacker’s knowledge of \(f\) (the variability of \(F\)) but less so on the random artifacts caused by initialization of weights and the training data used.³ This is because the averaging mitigates cases where there are only a few outlier models in \(F_{0}\) which are vulnerable to \(x^{\prime}\). As a result, the final transferability score captures how well \(x^{\prime}\) transfers to vulnerabilities which are common among the models in \(F_{0}\). The concept of models having shared vulnerabilities has been shown in works such as [19, 26].

4 Experiment Setup

In this section, we present the experiments which we have performed to evaluate the proposed blackbox transferability ranking strategy.

4.1 Evaluation Measures

To evaluate our ranking methods, we use transferability at \(k\) as defined in Equation (2). Note that transferability at \(k\) can also be viewed as the attack success rate on \(f\) for the top-\(k\) recommended samples. We remind the reader that ranking is performed without access to \(f\) or knowledge of \(f\). Therefore, it can be said that the transferability at \(k\) measures the adversary’s attack success rate in a black box setting when only \(k\) attempts (attacks) are allowed.

4.2 Datasets

For our experiments, we used four datasets:

CIFAR10.

An image classification dataset which contains 60K images from 10 categories having a resolution of 32 \(\times\) 32.

ImageNet.

A popular image classification benchmark dataset containing about 1.2M images from 1,000 classes downsamples to a resolution of 224 \(\times\) 224.

RSNA-X-Ray.

A medical dataset, containing approximately 30K pneumonia chest X-ray images resized to a resolution of 224 \(\times\) 224. The dataset contains three classes and was originally published on Kaggle⁴ by the Radiological Society of North America.

Road Sign.

A dataset for traffic sign classification containing 3K images from 58 different up-sampled to a resolution of 224 \(\times\) 224. The dataset was originally published on Kaggle.⁵

For the ImageNet and CIFAR10 datasets, we have used the original data splits, whereas for the RSNA-X-Ray and Road\(\_\)Sign dataset, the images were split train, test, and validation sets with respective sizes of into 70%, 20% and 10%. The training sets were used to train \(f\) and \(f^{\prime}\) and the test sets were used to create the adversarial examples (\(\mathscr{D}^{*}\)). Since \(f(x)\neq y\) is counted as a successful attack, we must remove all samples from the test set where the clean sample is misclassified. This is done to avoid bias and focus our results strictly on samples that transfer as a result of the attack.

4.3 Architectures

In our experiments, we used five different architectures:

DenseNet-121.

Employs a dense connectivity pattern that connects each layer to every other layer in a feed-forward fashion. Its 121 layers are divided into dense blocks that ensure maximum information flow between layers, making it robust to perturbations.

Efficientnet.

Utilizes a compound scaling method that uniformly scales all dimensions of depth, width, and resolution with a fixed set of scaling coefficients. This architecture provides a balance between speed and accuracy, optimized to perform well even with limited computational resources. In CIFAR10 evaluations we use Efficientnet-b0 and for the reset we use Efficientnet-b2.

Resnet18.

Introduces skip connections to allow the flow of gradients through the network without attenuation. With 18 layers, it is relatively shallow, ensuring quick computations while still capturing complex features.

Vision Transformer (ViT).

The ViT applies the principles of transformer models, primarily used in natural language processing, to image classification tasks. It treats image patches as sequences, allowing for global receptive fields from the outset of the model. For evaluations conducted on CIFAR10, we use a ViT model with 7 layers, 12 heads, and an multi-layer perceptron layer dimension of 1152. For evaluations performed on other datasets, we used the ViT\(\_\)b\(\_\)16 architecture.

Swin Transformer (Swin\(\_\) s).

Implements a hierarchical transformer whose representations are computed with shifted windows, enabling efficient modeling of image data with varying scales and sizes.

In experiments conducted on X-Ray and Road\(\_\)Sign datasets, we fine-tuned pre-trained ImageNet models obtained from torchvision.⁶ Conversely, for CIFAR10, we trained models from scratch. In the context of the ImageNet experiments, we employed the same pre-trained models as in the X-Ray and Road\(\_\)Sign experiments, with the sole exception being experiments necessitating the evaluation of identical model architectures. In such cases, we utilized a larger variation of the same model also obtained from torchvision.

4.4 Threat Model

For our all of our experiments, we consider a blackbox adversary that has no knowledge of the victim’s architecture. To simulate this setting, we ensured that the architectures in used for \(f\), \(f^{\prime}\) and those in \(F_{0}\) were all unique. This simulates a black box setting because the architectures used by the adversary (surrogates \(f^{\prime}\) and \(F_{0}\)) will be different from the architecture used in the victim model \(f\).⁷

4.5 Attack Algorithms

For the attacks, we use Fast Gradient Sign Method (FGSM) [9], PGD [16] and PGD\(+\)Momentum (denoted as Momentum) [6], which should have increased transferability according to [30]. All of these algorithms are considered accepted baselines when evaluating adversarial attacks [3, 8, 24]. The FGSM attack performs a single optimization step on \(x\) to generate \(\delta\). The PGD and Momentum algorithms perform multiple iterations where each iteration normalizes \(\delta\) according to a given p-norm.

In our experiments, we only perform untargeted attacks (\(f(x^{\prime})\neq y\)), where the algorithm is executed on \(f^{\prime}\) alone (bounded by \(\epsilon=\frac{1}{255}\) for CIFAR10 and \(\epsilon=\frac{4}{255}\) ImageNet, X-Ray, and Road Sign datasets). In our experiments, we used only PGD unless explicitly stated otherwise.

4.6 Ranking Algorithms

We evaluate our three ranking strategies:

SoftMax.

In this implementation of the strategy, we score a sample’s transferability by taking \(1-\sigma_{i}(f^{\prime}(x))\) where \(x\) is the clean sample. The use of SoftMax here is inspired by the works of [33] where SoftMax is used to capture a model’s instability in \(f\) (not \(f^{\prime}\)).

SoftMax\(+\) Noise.

For this version, we follow the work of [32]. In their work, the authors found that samples that are sensitive to noise on the victim model \(f\) happen to transfer better from \(f^{\prime}\) to \(f\). We extend their work to the task of ranking: each clean sample in the test set is scored according to how much random noise impacts the confidence of the surrogate \(f^{\prime}\). Samples which are more sensitive are ranked higher. Similar to [32], we also use Gaussian noise and set std=\(\frac{16}{255}\).

HET (ours).

To implement HET, we use one surrogate model \(f^{\prime}\) to produce the adversarial examples and a set of three other unique surrogate models as \(F_{0}\) to rank them.

Ranking with these strategies is achieved by (1) computing the respective score on each adversarial example \(x^{\prime}\in\mathscr{D}^{*}\) and then (2) sorting the samples by their score (descending order).

As a baseline evaluation, we evaluate the average transferability rate across all the dataset (no ranking). This baseline can also be viewed as a kind of lower bound on performance. Note that this is essentially the same as the common transferability evaluation measure used in the literature.

Finally, we contrast the above ranking methods to the performance of the optimal solution (upper bound). In the task transferability ranking (blackbox), the optimal solution achieved by ordering the samples according to their performance on \(f\) (as opposed to using surrogates).

4.7 Environment and Reproducibility

Our code was written using Pytorch and all models were trained and executed on Nvidia 6000RTX GPUs. To reproduce our results, the reader can access our code online.⁸

4.8 Experiments

We investigate the following ranking tasks: (sample ranking) where the attacker must select the top \(k\) samples from \(\mathscr{D}\) to use in an attack on \(f\), and (perturbation ranking) where the attacker must select the best perturbation for a specific sample \(x\) in an attack on \(f\). In the sample ranking scenario we evaluate all value of \(k\) from \(k=1\) to \(k=|\mathscr{D}^{*}|\). For the perturbation ranking scenario, we set \(k\) to be 5%, 10%, and 20% of the respective dataset size.

We performed the following experiments to evaluate how well our ranking strategies perform and generalize to different settings:

E1 - Sample Ranking.

In these experiments, we explore the performance of the ranking strategies in the context of ranking samples. In other words, given a set of \(|\mathscr{D}|\) different images, if only \(k\) can be used in an attack, which \(k\) images should be selected to maximize likelihood of success (transferability).

E1.1 - Architectures.

The purpose of this experiment is to evaluate the generalization of the ranking strategies to different blackbox settings. In this experiment, we explore the transferability of the ranked samples for every combination of surrogate and victim model architecture and generate adversarial examples on \(f^{\prime}\) using PGD. For each dataset and combination of architectures, we evaluated the transferability at \(k\) of the strategies for every possible value of \(k\) (from \(k=1\) to \(k=|\mathscr{D}^{*}|\)).

E1.2 - Attacks.

The objective of this experiment is to see if the ranking strategies generalize to different attacks and whether there are some attacks that transfer better than others. For each dataset and combination of architectures, we evaluated the transferability at \(k\) of the strategies where \(k\) was set to 5%, 10%, and 20% of the respective dataset size.

E2 - Pertubation Ranking.

In this experiment, we evaluate how well the strategies can be used to rank perturbations instead of samples. This experiment captures the setting where the adversary must use a specific sample (i.e., image) in the attack but can improve the likelihood of transferability by selecting the best perturbation for that sample.

E2.1 - Performance.

The experiment was conducted as follows: for each sample \(x\), we generated 27 different perturbations using PGD with random starts and random values for alpha (between \(\frac{0.1}{255}\text{ and }\frac{0.3}{255}\)) and the number of iterations (between \(10\text{ and }20\)). We then ranked \(x\) with each of these perturbations and took the highest ranked sample (\(k=1\)). This process was repeated 100 times per dataset. In a follow-up experiment, we then ranked these images setting \(k\) to 5%, 10%, and 20% of the respective dataset size.

5 Experiment Results

5.1 E1—Sample Ranking

5.1.1 E1.1—Architectures.

We direct the reader’s attention to the results presented in Figures 4 and 5, which presents the performance of our proposed ranking strategy, HET, across various datasets and model architecture pairings. Figure 4 details the outcomes for CIFAR10 and ImageNet, while Figure 5 delves into the X-Ray and Road Sign datasets. These figures plot the transferability at \(k\) for a spectrum of \(k\) sizes, with the columns representing the victim architecture and the rows indicating the surrogate architecture utilized.

Fig. 4.

Fig. 5.

Our observations reveal a consistent trend across all datasets and architecture combinations: HET closely tracks the upper bound line, which represents the theoretical maximum transferability. Particularly for small values of \(k\), HET demonstrates a high likelihood of successful transferability, often achieving near-certain effectiveness. As \(k\) increases, HET maintains commendable performance, deviating by at most approximately 20.

Conversely, the baseline methods, namely SoftMax and SoftMax\(+\)Noise, exhibit subpar performance. Out of 100 architecture combinations, only 45 yield a ranking that can be considered beneficial, and this is limited to instances where \(k\) is minimal (\(k=1\)). Moreover, the inconsistent performance between the two baseline methods presents an additional challenge, as it is unpredictable which method will be successful in any given scenario.

We note that the attacker only achieves the ideal results when he or she happens to select the same architecture for \(f^{\prime}\) as the victim in \(f\) (captured by the diagonal of the figures). Under these circumstances, transferability is notably higher for all methods across all \(k\) values, although not perfect. The discrepancy, even when architectures match, can be attributed to different training seeds affecting the models’ decision boundaries, as discussed in [14]. Despite this, HET still significantly enhances transferability for nearly every combination of architectures (where the attacker guesses wrong). This is especially apparent in the setting where the attacker only needs to send one or just a few adversarial examples (low \(k\)) from the set of all potential images.

When dissecting performance relative to the datasets, HET exhibits robust results for ImageNet, CIFAR10, and X-Ray. Nonetheless, there are instances within the Road Sign dataset where HET does not perform optimally at lower \(k\) values but recovers effectiveness at higher \(k\)s. This could be attributed to the varying image sizes in the dataset, which, when resized to fit the model input, may adversely impact the feature representations.

In summary, our comprehensive evaluation underscores the efficacy of the HET ranking strategy in diverse blackbox settings, confirming its potential to improve adversarial example transferability in real-world attack scenarios.

5.1.2 E1.2—Attacks.

In this part of the evaluation, we present the findings of our comparative analysis on the transferability of three adversarial attacks: FGSM, PGD, and Momentum, conducted across four diverse datasets. These results offer insights into the effectiveness of these attacks, the vulnerability of different victim models, and the choice of different surrogate models for each attack method. the comparison is presented in Tables 1 and 2, partitioned such that results for the CIFAR10 and ImageNet datasets appear in Table 1, and results for the X-Ray and Road Sign datasets appear in Table 2. The lower and upper bounds presented in the tables are from the PGD attack.

Table 1.

Surrogate		Victim	Lower \(B\)	PGD			Momentum			FGSM			Upper \(B\)
Surrogate		Victim	Lower \(B\)	5%	10%	20%	5%	10%	20%	5%	10%	20%	5%	10%	20%
ImageNet	ViT_b_16	ViT_l_16	0.451	0.997	0.987	0.954	0.995	0.986	0.946	1.000	0.997	0.982	1.000	1.000	1.000
		DenseNet121	0.394	0.990	0.979	0.934	0.990	0.978	0.926	0.994	0.990	0.968	1.000	1.000	1.000
		Efficientnet-b2	0.344	0.994	0.971	0.904	0.994	0.967	0.893	0.997	0.989	0.948	1.000	1.000	1.000
		Resnet18	0.442	0.995	0.983	0.943	0.995	0.984	0.937	0.999	0.992	0.973	1.000	1.000	1.000
		Swin_s	0.290	0.934	0.871	0.780	0.928	0.861	0.767	0.954	0.912	0.844	1.000	1.000	1.000
	DenseNet121	ViT_b_16	0.215	0.971	0.925	0.792	0.975	0.928	0.797	0.975	0.933	0.818	1.000	1.000	1.000
		DenseNet161	0.350	0.993	0.981	0.932	0.996	0.984	0.940	0.996	0.986	0.947	1.000	1.000	1.000
		Efficientnet-b2	0.279	0.989	0.962	0.873	0.991	0.967	0.881	0.989	0.970	0.893	1.000	1.000	1.000
		Resnet18	0.404	0.992	0.976	0.926	0.993	0.979	0.934	0.995	0.983	0.942	1.000	1.000	1.000
		Swin_s	0.202	0.948	0.879	0.749	0.949	0.888	0.756	0.953	0.898	0.778	1.000	1.000	1.000
	Efficientnet-b2	ViT_b_16	0.234	0.977	0.925	0.806	0.973	0.924	0.802	0.985	0.945	0.858	1.000	1.000	1.000
		DenseNet121	0.348	0.991	0.973	0.908	0.992	0.972	0.906	0.995	0.986	0.944	1.000	1.000	1.000
		Efficientnet_b1	0.349	0.996	0.982	0.933	0.996	0.984	0.933	0.999	0.994	0.968	1.000	1.000	1.000
		Resnet18	0.399	0.993	0.978	0.926	0.993	0.978	0.923	0.997	0.987	0.956	1.000	1.000	1.000
		Swin_s	0.226	0.951	0.889	0.771	0.952	0.893	0.768	0.968	0.919	0.832	1.000	1.000	1.000
	Resnet18	ViT_b_16	0.214	0.974	0.925	0.791	0.976	0.926	0.792	0.979	0.935	0.811	1.000	1.000	1.000
		DenseNet121	0.355	0.990	0.973	0.914	0.992	0.977	0.920	0.994	0.980	0.927	1.000	1.000	1.000
		Efficientnet-b2	0.274	0.988	0.962	0.869	0.991	0.965	0.875	0.987	0.969	0.888	1.000	1.000	1.000
		Resnet34	0.391	0.996	0.988	0.946	0.996	0.990	0.950	0.997	0.991	0.957	1.000	1.000	1.000
		Swin_s	0.196	0.951	0.880	0.748	0.951	0.887	0.749	0.956	0.889	0.763	1.000	1.000	0.981
	Swin_s	ViT_b_16	0.251	0.940	0.882	0.769	0.937	0.884	0.769	0.966	0.932	0.863	1.000	1.000	1.000
		DenseNet121	0.349	0.993	0.975	0.914	0.992	0.974	0.913	0.997	0.990	0.964	1.000	1.000	1.000
		Efficientnet-b2	0.308	0.983	0.958	0.881	0.985	0.960	0.883	0.997	0.984	0.951	1.000	1.000	1.000
		Resnet18	0.397	0.995	0.981	0.934	0.996	0.982	0.934	0.998	0.992	0.971	1.000	1.000	1.000
		Swin_t	0.602	1.000	0.996	0.984	1.000	0.998	0.987	0.999	1.000	0.997	1.000	1.000	1.000
CIFAR10	ViT	ViT	0.769	0.979	0.968	0.957	0.720	0.543	0.337	0.728	0.562	0.368	1.000	1.000	1.000
		DenseNet121	0.077	0.723	0.536	0.332	0.981	0.968	0.956	0.979	0.962	0.942	1.000	0.766	0.383
		Efficientnet-b0	0.136	0.808	0.692	0.483	0.795	0.689	0.494	0.817	0.717	0.541	1.000	1.000	0.680
		Resnet18	0.065	0.614	0.449	0.277	0.576	0.441	0.277	0.622	0.470	0.299	1.000	0.647	0.324
		Swin_s	0.219	0.831	0.746	0.624	0.834	0.751	0.627	0.854	0.780	0.672	1.000	1.000	1.000
	DenseNet121	ViT	0.290	0.781	0.721	0.611	0.787	0.723	0.609	0.823	0.775	0.713	1.000	1.000	1.000
		DenseNet121	0.510	0.984	0.962	0.903	0.988	0.970	0.919	0.986	0.964	0.915	1.000	1.000	1.000
		Efficientnet-b0	0.283	0.910	0.813	0.697	0.910	0.841	0.724	0.955	0.896	0.828	1.000	1.000	1.000
		Resnet18	0.286	0.827	0.710	0.635	0.849	0.745	0.669	0.899	0.816	0.719	1.000	1.000	1.000
		Swin_s	0.177	0.765	0.662	0.537	0.789	0.671	0.548	0.868	0.818	0.703	1.000	1.000	0.883
	Efficientnet-b0	ViT	0.291	0.808	0.715	0.603	0.803	0.725	0.611	0.796	0.732	0.629	1.000	1.000	1.000
		DenseNet121	0.355	0.954	0.892	0.799	0.996	0.992	0.969	0.987	0.968	0.939	1.000	1.000	1.000
		Efficientnet-b0	0.641	0.994	0.990	0.971	0.952	0.890	0.803	0.919	0.843	0.709	1.000	1.000	1.000
		Resnet18	0.392	0.936	0.865	0.788	0.940	0.859	0.786	0.912	0.827	0.715	1.000	1.000	1.000
		Swin_s	0.208	0.826	0.728	0.600	0.827	0.726	0.597	0.873	0.768	0.634	1.000	1.000	1.000
	Resnet18	ViT	0.271	0.786	0.729	0.602	0.788	0.740	0.600	0.801	0.778	0.695	1.000	1.000	1.000
		DenseNet121	0.204	0.788	0.685	0.570	0.910	0.847	0.717	0.979	0.909	0.828	1.000	1.000	1.000
		Efficientnet-b0	0.239	0.854	0.773	0.644	0.818	0.710	0.593	0.900	0.815	0.715	1.000	1.000	1.000
		Resnet18	0.251	0.890	0.812	0.697	0.852	0.789	0.666	0.940	0.883	0.801	1.000	1.000	1.000
		Swin_s	0.146	0.745	0.645	0.505	0.737	0.653	0.507	0.815	0.757	0.643	1.000	1.000	0.732
	Swin_s	ViT	0.390	0.809	0.734	0.687	0.815	0.743	0.680	0.853	0.784	0.720	1.000	1.000	1.000
		DenseNet121	0.155	0.850	0.733	0.548	0.988	0.975	0.927	0.993	0.983	0.952	1.000	1.000	0.778
		Efficientnet-b0	0.253	0.909	0.837	0.710	0.874	0.740	0.550	0.904	0.777	0.607	1.000	1.000	1.000
		Resnet18	0.121	0.765	0.608	0.447	0.916	0.849	0.706	0.942	0.882	0.756	1.000	1.000	0.604
		Swin_s	0.526	0.988	0.973	0.927	0.753	0.608	0.445	0.795	0.659	0.493	1.000	1.000	1.000

Table 1. The Comparative Performance of HET Across Different Attack Algorithms and Architecture Combinations for the ImageNet and CIFAR10 Datasets

Columns categorize the various attack algorithms employed, while rows detail the architecture pairings, with surrogate models (\(F_{0}\)) distinct from the victim’s architecture. The color shading indicates better (green) and worse (red) results. Shading is done per surrogate.

Table 2.

Surrogate		Victim	Lower \(B\)	PGD			Momentum			FGSM			Upper \(B\)
Surrogate		Victim	Lower \(B\)	5%	10%	20%	5%	10%	20%	5%	10%	20%	5%	10%	20%
X-Ray	ViT	ViT	0.469	0.976	0.943	0.900	0.973	0.944	0.899	0.972	0.958	0.918	1.000	1.000	1.000
		DenseNet121	0.280	0.931	0.863	0.746	0.919	0.868	0.745	0.941	0.887	0.770	1.000	1.000	1.000
		Efficientnet-b2	0.274	0.955	0.898	0.774	0.955	0.889	0.770	0.947	0.907	0.797	1.000	1.000	1.000
		Resnet18	0.278	0.943	0.883	0.761	0.946	0.878	0.751	0.957	0.901	0.781	1.000	1.000	1.000
		Swin_s	0.284	0.946	0.901	0.791	0.943	0.881	0.779	0.963	0.918	0.812	1.000	1.000	1.000
	DenseNet121	ViT	0.247	0.885	0.834	0.707	0.883	0.792	0.678	0.895	0.820	0.717	1.000	1.000	1.000
		DenseNet121	0.850	0.989	0.991	0.990	0.996	0.995	0.995	0.992	0.994	0.994	1.000	1.000	1.000
		Efficientnet-b2	0.436	0.933	0.898	0.852	0.905	0.890	0.826	0.891	0.870	0.803	1.000	1.000	1.000
		Resnet18	0.449	0.981	0.932	0.878	0.985	0.956	0.919	0.971	0.948	0.913	1.000	1.000	1.000
		Swin_s	0.287	0.937	0.902	0.802	0.952	0.920	0.816	0.946	0.910	0.835	1.000	1.000	1.000
	Efficientnet-b2	ViT	0.222	0.893	0.800	0.691	0.898	0.806	0.680	0.904	0.856	0.717	1.000	1.000	1.000
		DenseNet121	0.310	0.943	0.897	0.807	0.952	0.929	0.850	0.976	0.944	0.859	1.000	1.000	1.000
		Efficientnet-b2	0.992	1.000	1.000	1.000	0.990	0.995	0.985	0.996	0.992	0.987	1.000	1.000	1.000
		Resnet18	0.324	0.940	0.897	0.820	0.952	0.920	0.849	0.976	0.934	0.888	1.000	1.000	1.000
		Swin_s	0.249	0.933	0.883	0.774	0.925	0.901	0.802	0.940	0.914	0.819	1.000	1.000	1.000
	Resnet18	ViT	0.238	0.892	0.809	0.677	0.879	0.800	0.673	0.896	0.812	0.696	1.000	1.000	1.000
		DenseNet121	0.445	0.956	0.927	0.867	0.966	0.958	0.912	0.989	0.972	0.918	1.000	1.000	1.000
		Efficientnet-b2	0.438	0.959	0.924	0.875	0.956	0.903	0.851	0.959	0.924	0.863	1.000	1.000	1.000
		Resnet18	0.953	1.000	1.000	1.000	1.000	1.000	1.000	1.000	0.998	0.997	1.000	1.000	1.000
		Swin_s	0.282	0.916	0.890	0.794	0.929	0.904	0.803	0.933	0.911	0.835	1.000	1.000	1.000
	Swin_s	ViT	0.225	0.898	0.789	0.645	0.886	0.796	0.630	0.909	0.828	0.668	1.000	1.000	1.000
		DenseNet121	0.350	0.961	0.907	0.795	0.961	0.915	0.811	0.973	0.934	0.855	1.000	1.000	1.000
		Efficientnet-b2	0.367	0.976	0.916	0.836	0.973	0.936	0.862	0.985	0.965	0.909	1.000	1.000	1.000
		Resnet18	0.343	0.964	0.909	0.801	0.964	0.918	0.835	0.973	0.944	0.882	1.000	1.000	1.000
		Swin_s	0.967	1.000	1.000	0.999	1.000	1.000	1.000	1.000	0.998	0.998	1.000	1.000	1.000
Road Sign	ViT	ViT	0.207	0.500	0.800	0.545	0.667	0.857	0.500	0.000	0.667	0.857	1.000	1.000	1.000
		DenseNet121	0.121	0.500	0.600	0.455	0.667	0.571	0.357	0.000	0.333	0.714	1.000	1.000	0.636
		Efficientnet-b2	0.224	1.000	0.800	0.818	0.667	0.857	0.714	1.000	0.667	0.857	1.000	1.000	1.000
		Resnet18	0.121	1.000	0.600	0.636	0.667	0.571	0.500	1.000	1.000	0.571	1.000	1.000	0.636
		Swin_s	0.207	1.000	1.000	0.909	0.667	0.857	0.643	0.000	0.667	0.857	1.000	1.000	1.000
	DenseNet121	ViT	0.017	0.200	0.100	0.067	0.188	0.094	0.047	0.333	0.500	0.231	0.333	0.167	0.083
		DenseNet121	0.363	0.733	0.633	0.483	0.750	0.656	0.609	0.667	0.667	0.846	1.000	1.000	1.000
		Efficientnet-b2	0.110	0.667	0.500	0.367	0.688	0.500	0.422	0.667	0.833	0.692	1.000	1.000	0.550
		Resnet18	0.037	0.600	0.300	0.167	0.563	0.313	0.172	1.000	0.833	0.692	0.733	0.367	0.183
		Swin_s	0.033	0.600	0.333	0.167	0.563	0.313	0.156	0.667	0.667	0.538	0.667	0.333	0.167
	Efficientnet-b2	ViT	0.010	0.200	0.097	0.048	0.200	0.097	0.048	0.600	0.273	0.130	0.200	0.097	0.048
		DenseNet121	0.026	0.533	0.258	0.129	0.467	0.323	0.159	0.600	0.636	0.435	0.533	0.258	0.129
		Efficientnet-b2	0.753	1.000	0.968	0.935	1.000	1.000	0.968	0.800	0.909	0.870	1.000	1.000	1.000
		Resnet18	0.032	0.600	0.323	0.161	0.600	0.355	0.175	1.000	0.727	0.391	0.667	0.323	0.161
		Swin_s	0.032	0.533	0.323	0.161	0.467	0.290	0.143	0.600	0.455	0.304	0.667	0.323	0.161
	Resnet18	ViT	0.013	0.158	0.105	0.065	0.150	0.075	0.062	0.167	0.250	0.120	0.263	0.132	0.065
		DenseNet121	0.026	0.421	0.237	0.130	0.400	0.225	0.123	0.500	0.583	0.320	0.526	0.263	0.130
		Efficientnet-b2	0.090	0.684	0.526	0.377	0.650	0.475	0.370	1.000	0.833	0.600	1.000	0.921	0.455
		Resnet18	0.404	0.842	0.789	0.688	1.000	0.950	0.753	0.833	0.833	0.680	1.000	1.000	1.000
		Swin_s	0.026	0.368	0.263	0.130	0.450	0.250	0.123	0.500	0.333	0.400	0.526	0.263	0.130
	Swin_s	ViT	0.014	0.150	0.122	0.060	0.130	0.109	0.054	0.143	0.200	0.161	0.300	0.146	0.072
		DenseNet121	0.022	0.450	0.220	0.108	0.435	0.217	0.109	0.857	0.467	0.323	0.450	0.220	0.108
		Efficientnet-b2	0.072	0.650	0.439	0.325	0.739	0.543	0.359	0.714	0.667	0.516	1.000	0.732	0.361
		Resnet18	0.027	0.450	0.268	0.133	0.435	0.239	0.120	0.714	0.467	0.355	0.550	0.268	0.133
		Swin_s	0.275	0.800	0.683	0.590	0.870	0.717	0.598	0.857	0.733	0.548	1.000	1.000	1.000

Table 2. The Comparative Performance of HET Across Different Attack Algorithms and Architecture Combinations for the X-Ray and Road Sign Datasets

Overall we found that the HET ranking method is highly effective regardless of the attack algorithm used. In two of the datasets (ImageNet and CIFAR10), all three attack achieved near upper bound performance with HET.

Interestingly, in our analysis, we observed a high degree of transferability exhibited by the FGSM attack. Across these datasets, FGSM consistently demonstrated its effectiveness in crafting adversarial examples capable of successfully deceiving a range of diverse victim models. The FGSM works by performing a single large perturbation on the image, unlike, the PGD and Momentum attacks which perform many small steps. Using many attack steps may be preferable in a white box scenario, since it allows targeting less prominent features in the victim model and thus creating less noticeable perturbations. Nevertheless, in the blackbox scenario, this might hinder the attack transferability, since these features might only be present in the surrogate model and not the victim.

Our experiments also shed light on the vulnerabilities of various victim models; among them, EfficentNet, a popular deep learning architecture, which was found to be the most susceptible victim model across all datasets. This discovery emphasizes the imperative need for robustness enhancements in EfficentNet and similar models to mitigate the risks posed by adversarial attacks. Conversely, ViT and DenseNet121 are proved to be reliant surrogate models, which provide better transferability capabilities compared to the others. In the case of ViT, this may be attributed to the transformer architecture that allows learning generic features, which do not necessarily reflect a certain architecture choice such as convolutions.

Furthermore, our study highlighted the critical role played by the dataset characteristics in determining the success of adversarial attacks. Notably, ImageNet, with its large images and extensive class diversity comprising 1,000 classes, emerged as the most vulnerable dataset for adversarial attacks. The complexity and diversity inherent in ImageNet make it an enticing target for attackers, as it offers more opportunities to craft adversarial examples that can effectively deceive a wide range of victim models. These findings underscore the necessity for heightened security measures, particularly in complex and diverse datasets like ImageNet, to safeguard against adversarial threats.

5.2 E2—Perturbation Ranking

5.2.1 E2.1—Performance.

In Figure 6, we present the average transferability success rate of a sample when the highest ranked perturbation (out of 25) is used. Although perturbation ranking is less effective than image ranking, the figure shows that transferability can indeed be improved modestly in many situations. We also note that the attacker receives the largest benefit from ranking if the surrogate happens to be the same architecture as the victim. However, these cases can be considered rare in a strict blackbox setting.

Fig. 6.

We also observe that similar to the results in E1.1, the Road Sign dataset is challenging to perform ranking on. We believe this is because the Road Sign dataset is relatively small resulting in surrogate models with unaligned loss surfaces [4]. However, when the attacker has a large enough dataset (disjoint from the victim), then ranking is effective (e.g., for the case of ImageNet, CIFAR10, and X-Ray).

In Tables 3 and 4, we present the results when ranking images for different \(k\) after applying the best perturbation to each image. Table 3 presents the findings for the CIFAR10 and ImageNet datasets, while Table 4 provides the results for the X-Ray and Road Sign datasets. Each cell within these tables indicates the average transferability from 100 random images selected from the dataset, with the columns denoting the ranking methods alongside the established upper and lower bounds. The rows detail the combinations of architectures for the victim and surrogate models.

Table 3.

Surrogate		Victim	Lower \(B\)	SoftMax			SoftMax\(+\)Noise			HET			Upper \(B\)
Surrogate		Victim	Lower \(B\)	5%	10%	20%	5%	10%	20%	5%	10%	20%	5%	10%	20%
ImageNet	ViT_b_16	ViT_l_16	0.40	0.40	0.60	0.45	0.20	0.30	0.30	1.00	1.00	0.95	1.00	1.00	1.00
		DenseNet121	0.31	0.40	0.50	0.40	0.40	0.40	0.40	1.00	1.00	0.85	1.00	1.00	1.00
		Efficientnet-b2	0.27	0.20	0.10	0.25	0.20	0.30	0.30	1.00	0.90	0.80	1.00	1.00	1.00
		Resnet18	0.37	0.40	0.40	0.45	0.40	0.40	0.35	1.00	1.00	0.90	1.00	1.00	1.00
		Swin_s	0.22	0.40	0.50	0.35	0.20	0.30	0.25	0.80	0.80	0.65	1.00	1.00	0.99
	DenseNet121	ViT_b_16	0.23	0.20	0.30	0.30	0.20	0.20	0.30	1.00	0.80	0.65	1.00	1.00	0.99
		DenseNet161	0.38	0.60	0.40	0.45	0.40	0.50	0.50	1.00	1.00	1.00	1.00	1.00	1.00
		Efficientnet-b2	0.28	0.20	0.20	0.35	0.20	0.30	0.35	1.00	1.00	0.80	1.00	1.00	1.00
		Resnet18	0.47	0.40	0.40	0.45	0.20	0.30	0.40	1.00	0.90	0.90	1.00	1.00	1.00
		Swin_s	0.19	0.20	0.20	0.20	0.20	0.20	0.20	0.80	0.80	0.70	1.00	1.00	0.93
	Efficientnet-b2	ViT_b_16	0.26	0.20	0.40	0.35	0.00	0.20	0.20	1.00	1.00	0.80	1.00	1.00	1.00
		DenseNet121	0.39	0.00	0.40	0.40	0.00	0.20	0.35	1.00	1.00	0.95	1.00	1.00	1.00
		Efficientnet-b1	0.45	0.40	0.50	0.50	0.40	0.40	0.40	1.00	1.00	0.95	1.00	1.00	1.00
		Resnet18	0.45	0.00	0.40	0.45	0.20	0.40	0.50	0.80	0.90	0.90	1.00	1.00	1.00
		Swin_s	0.23	0.20	0.30	0.30	0.00	0.00	0.15	1.00	0.90	0.75	1.00	1.00	0.99
	Resnet18	ViT_b_16	0.25	0.20	0.30	0.35	0.20	0.40	0.30	1.00	1.00	0.75	1.00	1.00	0.99
		DenseNet121	0.39	0.40	0.60	0.50	0.20	0.40	0.30	1.00	0.90	0.90	1.00	1.00	1.00
		Efficientnet-b2	0.28	0.40	0.40	0.30	0.40	0.50	0.35	1.00	0.90	0.75	1.00	1.00	1.00
		Resnet34	0.38	0.20	0.40	0.35	0.20	0.40	0.35	1.00	1.00	0.90	1.00	1.00	1.00
		Swin_s	0.19	0.00	0.20	0.15	0.20	0.30	0.20	1.00	0.80	0.70	1.00	1.00	0.93
	Swin_s	ViT_b_16	0.25	0.40	0.30	0.35	0.40	0.40	0.25	1.00	0.80	0.75	1.00	1.00	1.00
		DenseNet121	0.30	0.40	0.30	0.40	0.40	0.40	0.30	1.00	1.00	0.90	1.00	1.00	1.00
		Efficientnet-b2	0.26	0.40	0.30	0.30	0.20	0.30	0.25	1.00	0.90	0.75	1.00	1.00	1.00
		Resnet18	0.36	0.40	0.50	0.50	0.60	0.50	0.40	1.00	1.00	0.80	1.00	1.00	1.00
		Swin_t	0.60	0.80	0.90	0.80	0.80	0.80	0.75	1.00	1.00	1.00	1.00	1.00	1.00
CIFAR10	ViT	ViT	0.06	0.00	0.00	0.05	0.00	0.10	0.10	0.60	0.40	0.25	1.00	0.67	0.34
		DenseNet121	0.71	0.80	0.70	0.75	0.80	0.80	0.80	0.60	0.70	0.85	1.00	1.00	1.00
		Efficientnet-b0	0.16	0.20	0.20	0.20	0.20	0.20	0.20	0.80	0.80	0.60	1.00	1.00	0.81
		Resnet18	0.06	0.00	0.00	0.00	0.00	0.00	0.05	0.60	0.50	0.30	1.00	0.64	0.32
		Swin_s	0.18	0.20	0.20	0.25	0.20	0.40	0.25	0.40	0.70	0.60	1.00	1.00	0.90
	DenseNet121	ViT	0.32	0.40	0.30	0.20	0.60	0.60	0.35	0.80	0.90	0.70	1.00	1.00	1.00
		DenseNet121	0.72	1.00	0.80	0.80	0.80	0.70	0.75	1.00	1.00	1.00	1.00	1.00	1.00
		Efficientnet-b0	0.36	0.60	0.40	0.40	0.40	0.40	0.30	1.00	1.00	0.85	1.00	1.00	1.00
		Resnet18	0.34	0.40	0.60	0.50	0.60	0.50	0.40	1.00	0.90	0.85	1.00	1.00	1.00
		Swin_s	0.18	0.40	0.30	0.20	0.40	0.30	0.25	1.00	0.70	0.60	1.00	1.00	0.84
	Efficientnet-b0	ViT	0.34	0.20	0.20	0.25	0.20	0.20	0.30	1.00	0.90	0.70	1.00	1.00	1.00
		DenseNet121	0.65	1.00	0.90	0.80	1.00	0.80	0.75	1.00	1.00	0.90	1.00	1.00	1.00
		Efficientnet-b0	0.32	0.20	0.10	0.15	0.40	0.30	0.30	1.00	1.00	0.75	1.00	1.00	1.00
		Resnet18	0.40	0.40	0.40	0.45	0.80	0.60	0.40	1.00	0.90	0.85	1.00	1.00	1.00
		Swin_s	0.21	0.00	0.00	0.10	0.20	0.20	0.20	1.00	0.70	0.60	1.00	1.00	0.98
	Resnet18	ViT	0.29	0.40	0.40	0.30	0.60	0.60	0.40	1.00	0.70	0.60	1.00	1.00	1.00
		DenseNet121	0.24	0.40	0.50	0.45	0.00	0.20	0.35	1.00	1.00	0.75	1.00	1.00	0.95
		Efficientnet-b0	0.16	0.60	0.50	0.35	0.00	0.30	0.35	0.40	0.60	0.40	1.00	1.00	0.84
		Resnet18	0.22	0.40	0.50	0.40	0.00	0.20	0.35	0.80	0.70	0.55	1.00	1.00	0.94
		Swin_s	0.14	0.40	0.30	0.20	0.20	0.10	0.20	0.60	0.60	0.45	1.00	1.00	0.75
	Swin_s	ViT	0.40	0.20	0.20	0.20	0.40	0.30	0.35	0.60	0.70	0.70	1.00	1.00	1.00
		DenseNet121	0.47	0.20	0.20	0.25	0.60	0.70	0.45	1.00	0.90	0.90	1.00	1.00	1.00
		Efficientnet-b0	0.14	0.00	0.00	0.00	0.20	0.30	0.15	0.60	0.70	0.55	1.00	1.00	0.74
		Resnet18	0.31	0.40	0.20	0.20	0.40	0.20	0.20	1.00	0.90	0.90	1.00	1.00	1.00
		Swin_s	0.11	0.00	0.00	0.05	0.20	0.10	0.05	1.00	0.50	0.55	1.00	0.99	0.57

Table 3. The Average Transferability of a Sample When Ranking its Potential Perturbations for the CIFAR10 and ImageNet Datasets Over 100 Trials

Columns represent the various ranking methods, and rows indicate the combination of victim and surrogate model architectures, ensuring that (\(F_{0}\)) is chosen from architectures different from the victim’s. The color shading indicates better (green) and worse (red) results. Shading is done per surrogate.

Table 4.

Surrogate		Victim	Lower \(B\)	SoftMax			SoftMax			HET			Upper \(B\)
Surrogate		Victim	Lower \(B\)	5%	10%	20%	5%	10%	20%	5%	10%	20%	5%	10%	20%
X-Ray	ViT	ViT	0.61	0.80	0.70	0.65	0.80	0.70	0.70	1.00	1.00	1.00	1.00	1.00	1.00
		DenseNet121	0.34	0.40	0.30	0.35	0.20	0.20	0.25	1.00	0.80	0.75	1.00	1.00	1.00
		Efficientnet-b2	0.53	0.60	0.70	0.55	0.60	0.60	0.55	0.80	0.70	0.75	1.00	1.00	1.00
		Resnet18	0.36	0.60	0.50	0.45	0.40	0.40	0.40	1.00	1.00	0.90	1.00	1.00	1.00
		Swin_s	0.32	0.60	0.40	0.30	0.20	0.20	0.25	0.60	0.70	0.75	1.00	1.00	1.00
	DenseNet121	ViT	0.21	0.20	0.20	0.25	0.40	0.40	0.45	0.80	0.60	0.45	1.00	1.00	0.99
		DenseNet121	0.83	0.80	0.80	0.80	0.60	0.80	0.70	1.00	1.00	1.00	1.00	1.00	1.00
		Efficientnet-b2	0.55	0.40	0.40	0.40	0.60	0.80	0.55	1.00	0.90	0.65	1.00	1.00	1.00
		Resnet18	0.39	0.20	0.40	0.45	0.60	0.50	0.45	1.00	0.80	0.65	1.00	1.00	1.00
		Swin_s	0.24	0.00	0.10	0.25	0.40	0.40	0.35	0.80	0.80	0.55	1.00	1.00	0.95
	Efficientnet-b2	ViT	0.21	0.20	0.30	0.30	0.40	0.20	0.15	1.00	0.70	0.60	1.00	1.00	0.97
		DenseNet121	0.26	0.40	0.30	0.30	0.20	0.20	0.10	1.00	1.00	0.75	1.00	1.00	0.99
		Efficientnet-b2	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
		Resnet18	0.23	0.60	0.30	0.30	0.40	0.40	0.20	0.80	0.70	0.55	1.00	1.00	0.94
		Swin_s	0.18	0.40	0.30	0.35	0.20	0.20	0.15	0.80	0.60	0.50	1.00	1.00	0.88
	Resnet18	ViT	0.21	0.00	0.10	0.20	0.20	0.20	0.30	1.00	0.70	0.40	1.00	1.00	0.99
		DenseNet121	0.55	0.80	0.40	0.45	0.40	0.50	0.60	1.00	1.00	0.90	1.00	1.00	1.00
		Efficientnet-b2	0.59	0.60	0.70	0.60	0.40	0.50	0.45	1.00	0.80	0.65	1.00	1.00	1.00
		Resnet18	0.99	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
		Swin_s	0.25	0.20	0.40	0.30	0.20	0.10	0.10	0.80	0.60	0.40	1.00	1.00	0.99
	Swin_s	ViT	0.24	0.40	0.30	0.25	0.60	0.40	0.40	0.80	0.60	0.50	1.00	1.00	1.00
		DenseNet121	0.46	0.40	0.30	0.50	0.80	0.60	0.55	0.80	0.90	0.80	1.00	1.00	1.00
		Efficientnet-b2	0.53	0.20	0.60	0.60	0.80	0.60	0.60	0.80	0.70	0.65	1.00	1.00	1.00
		Resnet18	0.36	0.80	0.60	0.60	0.60	0.40	0.30	0.80	0.70	0.55	1.00	1.00	1.00
		Swin_s	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
Road Sign	ViT	ViT	0.05	0.00	0.00	0.00	0.00	0.00	0.00	0.20	0.20	0.15	0.72	0.47	0.24
		DenseNet121	0.01	0.00	0.00	0.00	0.00	0.00	0.00	0.20	0.10	0.05	0.20	0.10	0.05
		Efficientnet-b2	0.06	0.00	0.00	0.00	0.00	0.00	0.00	0.60	0.50	0.30	0.91	0.56	0.28
		Resnet18	0.03	0.00	0.00	0.00	0.00	0.00	0.00	0.40	0.20	0.15	0.59	0.29	0.15
		Swin_s	0.03	0.00	0.00	0.00	0.00	0.00	0.00	0.60	0.30	0.15	0.61	0.30	0.15
	DenseNet121	ViT	0.01	0.00	0.00	0.00	0.00	0.00	0.00	0.20	0.10	0.05	0.20	0.10	0.05
		DenseNet121	0.31	0.20	0.10	0.20	0.00	0.20	0.15	0.80	0.80	0.55	1.00	1.00	0.97
		Efficientnet-b2	0.06	0.00	0.00	0.00	0.00	0.00	0.00	0.60	0.50	0.30	1.00	0.61	0.30
		Resnet18	0.03	0.00	0.00	0.00	0.00	0.00	0.00	0.40	0.20	0.15	0.67	0.34	0.17
		Swin_s	0.02	0.00	0.00	0.00	0.00	0.00	0.00	0.40	0.20	0.10	0.44	0.22	0.11
	Efficientnet-b2	ViT	0.01	0.00	0.00	0.00	0.00	0.00	0.00	0.20	0.10	0.05	0.20	0.10	0.05
		DenseNet121	0.01	0.00	0.00	0.00	0.00	0.00	0.00	0.20	0.10	0.05	0.20	0.10	0.05
		Efficientnet-b2	0.73	0.60	0.80	0.75	0.60	0.80	0.85	1.00	0.90	0.90	1.00	1.00	1.00
		Resnet18	0.02	0.00	0.00	0.00	0.00	0.00	0.00	0.20	0.10	0.10	0.59	0.30	0.15
		Swin_s	0.02	0.00	0.00	0.00	0.20	0.10	0.05	0.40	0.20	0.10	0.40	0.20	0.10
	Resnet18	ViT	0.01	0.00	0.00	0.00	0.00	0.00	0.00	0.20	0.10	0.05	0.20	0.10	0.05
		DenseNet121	0.01	0.00	0.00	0.00	0.00	0.00	0.00	0.20	0.10	0.05	0.27	0.13	0.07
		Efficientnet-b2	0.05	0.00	0.00	0.00	0.00	0.00	0.00	0.60	0.30	0.25	0.98	0.60	0.30
		Resnet18	0.54	0.40	0.40	0.35	0.20	0.20	0.20	1.00	0.90	0.90	1.00	1.00	1.00
		Swin_s	0.02	0.00	0.00	0.00	0.00	0.00	0.00	0.40	0.20	0.10	0.40	0.20	0.10
	Swin_s	ViT	0.01	0.00	0.00	0.00	0.00	0.00	0.00	0.20	0.10	0.05	0.20	0.10	0.05
		DenseNet121	0.01	0.00	0.00	0.00	0.00	0.00	0.00	0.20	0.10	0.05	0.20	0.10	0.05
		Efficientnet-b2	0.07	0.00	0.00	0.00	0.00	0.00	0.00	0.80	0.60	0.35	1.00	0.71	0.36
		Resnet18	0.04	0.00	0.00	0.00	0.00	0.00	0.00	0.40	0.30	0.20	0.64	0.32	0.16
		Swin_s	0.53	0.00	0.10	0.25	0.20	0.20	0.20	1.00	0.80	0.80	1.00	1.00	0.99

Table 4. The Average Transferability of a Sample When Ranking Potential Perturbations for the X-Ray and Road Sign Datasets Over 100 Trials

The results underscore the proficiency of HET in consistently pinpointing the most transferable perturbation for a given sample. The significance of this capability is highlighted by the comparison to the lower bound of transferability—averaging at or below 30% across the datasets—which HET substantially elevates to an average of 70% or greater. Particularly striking is HET’s performance for the ImageNet, X-Ray, and Road Sign datasets, where it nearly mirrors the upper bound. In the X-Ray and ImageNet datasets, HET achieves perfect transferability across almost all blackbox scenarios (selection of architecture combinations).

Even as the value of \(k\) increases, representing a broader selection of top-ranking perturbations, HET maintains its superior performance. It exhibits an enhancement of up to 60% in transferability over the lower bound for larger values of \(k\). This improvement is noteworthy, demonstrating the robustness of HET in a variety of conditions.

Conversely, the baseline methods of SoftMax and SoftMax\(+\)Noise generally hover around the lower bound, occasionally achieving up to a 40% increase in transferability, yet still falling short of the performance attained by HET. The disparity between these methods and HET is particularly evident within the Road Sign dataset. For lower values of \(k\), the baselines are analogous to the lower bound, whereas HET’s results are akin to the upper bound, accentuating the substantial advantage provided by the HET ranking strategy in these scenarios.

6 Related Work

There has been a great amount of research done on adversarial transferability, discussing attacks [20, 22, 27, 32], defenses [11, 17] and performing general analysis of the phenomena [4, 14, 25]. However, these works do not rank transferability from the attacker’s side, but rather evaluate transferability of entire datasets directly on the victim model. In contrast, our work defines the task of transferability ranking for blackbox attackers and proposes methods for performing the task.

Some works have proposed techniques and measures for identifying samples that transfer better than others. For example, [33] show that some subsets of samples transfer better than others and suggest ways to curate evaluation datasets for transferability. They found that there is a connection between the transferability success of \(x^{\prime}\) to \(f\) when observing the SoftMax outputs of \(f(x)\). In an other work, sensitivity of \(x\) to Gaussian noise on \(f\) was found to be correlated with transferability of \(x^{\prime}\) to \(f\) [32]. However, in contrast to our work, the processes described in these works only apply to whitebox scenarios with access to \(f\). In many cases, attackers do not have access to the victim’s model, cannot send many adversarial examples to the model without raising suspicion, or cannot receive feedback from the model (e.g., classifiers used in airport X-ray machines [1]). Therefore, they cannot evaluate their attack on the victim’s model prior. Moreover, these works do not define the task of transferability ranking or suggest methods for measuring transferability such as transferability at \(k\) .

Finally, in contrast to prior works, we suggest a more grounded approach to evaluating model security in transfer attacks. We recommend that the community evaluate their models against the top \(k\) most transferable samples from a blackbox perspective, and not by taking the average success of all samples in whitebox perspective. This is because a true attacker will likely select the best samples to use with ultimately increases the performance (threat) of transferability attacks.

7 Conclusion

The results garnered from our extensive experimentation with the HET ranking strategy offer compelling evidence of its effectiveness in enhancing the transferability of adversarial examples. Notably, the strategy demonstrates remarkable efficacy in the context of improving the transferability of a single specific sample, a scenario of particular relevance to blackbox attackers. The findings reveal that attackers are not “stuck” with a resulting adversarial example and its likelihood of transferability; rather, by applying HET to select the best perturbation from a set of trials, attackers can significantly boost the likelihood of a successful transfer. Ideal results from transferability ranking can only be achieved when the attacker happens to guess the victim’s architecture and use it to generate the adversarial examples. However, our results show that if the attacker only needs to send less than 10% of a set of potential samples, then ranking increases the chance of transferability for those samples significantly, and nearly guarantees it when selecting only one (\(k=1\)).

This efficiency in attack methodology is a critical advantage in practical adversarial settings. It underscores a reduced risk for the attacker, as there is no longer a requirement to send to the victim a vast number of perturbations in the hope of stumbling upon a successful one. Instead, HET provides a systematic and predictive approach to identifying vulnerabilities in the victim without sending any samples to the victim first.

Moreover, the consistent performance of HET across various datasets and model architectures offers a deeper understanding of the intrinsic characteristics of adversarial attacks. As suggested in previous works, it supports the claim that that different models, despite their distinct architectural designs, often share common weaknesses [20, 22, 27, 32].

The ability to predict and exploit these shared vulnerabilities is significant. It implies that there is an underlying structure to the adversarial space that HET can effectively navigate. By selecting perturbations that are universally potent across models, HET enables attackers to reliably anticipate the success of their attacks in blackbox settings. This observation not only affirms the utility of HET but also provides a valuable insight into the nature of transferability, suggesting that the successful perturbation is not a fluke but a systematic exploitation of a model’s fundamental susceptibilities. The implications of this for both attackers and defenders are profound, as it calls for a deeper exploration into the robustness of models and the development of more sophisticated defense mechanisms.

Later on the Softmax (SM) without HET method is better and finally at very high k values the sensitivity method is better. This translates to the abilities of the SM without HET to detect the best and worst samples out of the set but less so the ranking among the samples in the middle. This implies that methods are not always consistent in their performance over different \(k\) values so it’s possible that an attacker may choose different methods for different situations. In the ImageNet, we see that the noise sensitivity ranking performed worse than the SM without HET in all of the \(k\) values.

Footnotes

In this work, we concentrate on untargeted attacks, though our methodology could be adapted for targeted attacks with the aim of achieving \(f_{j}(x^{\prime}_{i})=y_{t}\) where \(y_{t}\) represents the class that the attacker intends to mimic.

\(f\) is never included in \(F_{0}\) for all of our experiments.

This assumes that the training data for each model in \(F\) are drawn from the same distribution.

⁴

https://www.kaggle.com/competitions/rsna-pneumonia-detection-challenge/overview

⁵

https://www.kaggle.com/datasets/ahemateja19bec1025/traffic-sign-dataset-classification?resource=download

⁶

https://pytorch.org/vision/stable/models.html

⁷

The only exception is where the victim uses the ViT architecture. In this case alone, we let the adversary use a different sized ViT in \(F_{0}\) due to time limitations in training.

⁸

https://github.com/guyAmit/Adversarial_Ranking

References

[1]

Samet Akcay and Toby Breckon. 2022. Towards automatic threat detection: A survey of advances of deep learning within X-ray security imaging. Pattern Recognition 122, 108245.

Abstract

1 Introduction

2 Definitions

2.1 ET

2.2 Transferability Ranking

2.3 Transferability at k

3 Implementation

3.1 Approximate ET (AET)

3.2 Heuristical ET (HET)

3.3 Blackbox Ranking Strategies

4 Experiment Setup

4.1 Evaluation Measures

4.2 Datasets

4.3 Architectures

4.4 Threat Model

4.5 Attack Algorithms

4.6 Ranking Algorithms

4.7 Environment and Reproducibility

4.8 Experiments

5 Experiment Results

5.1 E1—Sample Ranking

5.1.1 E1.1—Architectures.

5.1.2 E1.2—Attacks.

5.2 E2—Perturbation Ranking

5.2.1 E2.1—Performance.

6 Related Work

7 Conclusion

Footnotes

References

Index Terms

Recommendations

A Compound Data Poisoning Technique with Significant Adversarial Effects on Transformer-based Sentiment Classification Tasks

DeT: Defending Against Adversarial Examples via Decreasing Transferability

Adversarial examples: A survey of attacks and defenses in deep learning-enabled cybersecurity systems

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations