0% found this document useful (0 votes)
58 views15 pages

WF Attacks

Uploaded by

anonymous194500
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views15 pages

WF Attacks

Uploaded by

anonymous194500
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Robust and Reliable Early-Stage Website Fingerprinting Attacks

via Spatial-Temporal Distribution Analysis


Xinhao Deng Qi Li Ke Xu
INSC & BNRist, Tsinghua University INSC, Tsinghua University DCST, Tsinghua University
Beijing, China Zhongguancun Laboratory Zhongguancun Laboratory
dengxh23@mails.tsinghua.edu.cn Beijing, China Beijing, China
qli01@tsinghua.edu.cn xuke@tsinghua.edu.cn

Abstract ACM Reference Format:


Website Fingerprinting (WF) attacks identify the websites visited Xinhao Deng, Qi Li, and Ke Xu. 2024. Robust and Reliable Early-Stage Web-
site Fingerprinting Attacks via Spatial-Temporal Distribution Analysis. In
arXiv:2407.00918v1 [cs.CR] 1 Jul 2024

by users by performing traffic analysis, compromising user pri-


Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communi-
vacy. Particularly, DL-based WF attacks demonstrate impressive cations Security (CCS ’24), October 14–18, 2024, Salt Lake City, UT, USA. ACM,
attack performance. However, the effectiveness of DL-based WF New York, NY, USA, 15 pages. https://doi.org/10.1145/3658644.3670272
attacks relies on the collected complete and pure traffic during the
page loading, which impacts the practicality of these attacks. The
1 Introduction
WF performance is rather low under dynamic network conditions
and various WF defenses, particularly when the analyzed traffic is Tor [12] is the most popular anonymous communication system,
only a small part of the complete traffic. In this paper, we propose boasting millions of active daily users [28]. Tor utilizes various
Holmes, a robust and reliable early-stage WF attack. Holmes uti- mechanisms, including randomly selected relays and multi-layer
lizes temporal and spatial distribution analysis of website traffic encryption, to anonymize user browsing behaviors. Unfortunately,
to effectively identify websites in the early stages of page loading. Tor is vulnerable to Website Fingerprinting (WF) attacks [2, 10,
Specifically, Holmes develops adaptive data augmentation based on 21, 35, 36, 39, 40]. WF attacks utilize Machine Learning (ML) or
the temporal distribution of website traffic and utilizes a supervised Deep Learning (DL) models to extract unique traffic patterns of
contrastive learning method to extract the correlations between the websites and effectively identify the websites visited by Tor users. In
early-stage traffic and the pre-collected complete traffic. Holmes particular, existing DL-based WF attacks demonstrate outstanding
accurately identifies traffic in the early stages of page loading by attack performance, achieving over 95% accuracy [10, 37, 39, 40].
computing the correlation of the traffic with the spatial distribution WF attacks on Tor traffic are challenging, yet these attacks can also
information, which ensures robust and reliable detection according be successfully applied to other privacy-preserving systems [11, 47].
to early-stage traffic. We extensively evaluate Holmes using six The DL-based WF attacks heavily rely on the collected com-
datasets. Compared to nine existing DL-based WF attacks, Holmes plete and pure traffic during the page loading for traffic analysis. In
improves the F1-score of identifying early-stage traffic by an av- practice, adversaries cannot perceive the entire process of website
erage of 169.18%. Furthermore, we replay the traffic of visiting loading traffic due to mixed background traffic. Existing WF attacks
real-world dark web websites. Holmes successfully identifies dark apply fixed conditions for traffic collection [10, 36, 37, 39, 40]. These
web websites when the ratio of page loading on average is only settings do not consider the differences between websites and may
21.71%, with an average precision improvement of 169.36% over the compromise the attack performance, e.g., the adversary can only
existing WF attacks. collect partial traffic from slow-loading websites. Particularly, poor
network conditions and WF defenses also prevent the adversary
from effectively collecting the complete pure traffic of page load-
CCS Concepts ing, leading to a significant decrease in attack performance against
• Networks → Network privacy and anonymity. certain websites [23]. Our study shows that the robust WF attack
(i.e., DF) achieves an average precision of over 91% for all web-
Keywords sites under the WTF-PAD defense. Notably, the lowest precision of
Tor; privacy; website fingerprinting; spatial-temporal analysis; con- fingerprinting is only less than 55%1 .
trastive learning To address the limitations of existing DL-based WF attacks, we
aim to develop an effective WF attack, i.e., the early-stage WF attack,
that only utilizes the traffic generated from the early stage of page
Permission to make digital or hard copies of all or part of this work for personal or loading. The early-stage WF attack can identify the visited website
classroom use is granted without fee provided that copies are not made or distributed during early-stage page loading. As shown in Figure 1, compared
for profit or commercial advantage and that copies bear this notice and the full citation with existing WF attacks, the early-stage attack does not require
on the first page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or waiting for the complete traffic of page loading. However, there are
republish, to post on servers or to redistribute to lists, requires prior specific permission three critical challenges in constructing the early-stage WF attack.
and/or a fee. Request permissions from permissions@acm.org.
(i) Early-stage traffic under dynamic network conditions is prone
CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA
© 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM. to traffic misidentification. Dynamic network conditions refer to
ACM ISBN 979-8-4007-0636-3/24/10
https://doi.org/10.1145/3658644.3670272 1 The detailed results can be found in Section 6.5.
CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA Xinhao Deng, Qi Li, and Ke Xu

Traffic CDF of attack embedding space and calculates the correlation between the un-
Traffic
success probability
1 known traffic and each website based on the spatial distribution of
Known traffic

Temporal
distribution
website traffic. It allows Holmes to perform website identification
Analysis at each short interval of traffic collection. The identified results
Spatial per traffic
distribution with low confidence will be rejected because the dynamic network
Websites conditions or defenses cause insufficient website information in
Websites 0 Existing the early-stage traffic. Holmes automatically continues collecting
0 100
Early-stage WF attack Page loading ratio (%) WF attacks more packets and analyzing the traffic at the next interval until the
website is successfully identified. Therefore, Holmes can ensure
Figure 1: Comparison of the early-stage WF attack with ex- adaptive traffic collection and reliable early-stage website identifi-
isting WF attacks. The early-stage WF attack can identify cation.
websites in the early stage of page loading. We prototype Holmes and conduct extensive performance eval-
uations using six different datasets, including the Alexa-top web-
sites dataset, dark web websites dataset, and four defense datasets.
that Tor users may use different paths with different bandwidths Moreover, we implement nine advanced DL-based WF attacks for
and latency across various networks. Under such dynamic network comparison with Holmes. Compared to the existing WF attacks,
conditions, the patterns of different traffic from the same website Holmes achieves an average improvement of 169.18% in the F1-
vary. Traffic at the early stages of page loading contains less website score to identify the early-stage traffic. Particularly, the experimen-
information, which varies under dynamic network conditions. (ii) tal results under multiple defenses demonstrate the exceptional
Early-stage WF attacks are more susceptible to various defenses. robustness of Holmes. Furthermore, we evaluate the performance
By padding dummy packets [16, 23], delaying packets [5, 44] or of Holmes under real-world deployment. We selected 80 popular
splitting traffic [8], defenses can significantly impact the effective- dark web websites based on Tor onion services [41] and collected
ness of WF attacks. (iii) The page loading speeds vary significantly real-world dark web traffic in August 2023 and April 2024. Holmes
across different websites, making it difficult to ensure high preci- achieves a precision of 85.19% in identifying these real-world dark
sion in detecting early-stage traffic of all websites. Since existing web websites, with an average page loading ratio of only 21.71%.
WF attacks based on fixed-setting traffic collection cannot perceive The contributions of our work are three-fold:
the page loading of websites visited by Tor users, the effective-
• We propose Holmes, the first robust and reliable early-stage
ness is unreliable. Especially, as discussed above, they achieve very
WF attack against Tor traffic, which can fingerprint websites
low identification precision in detecting the traffic visiting some
according to a small amount of traffic visiting the websites.
websites.
• Holmes utilizes feature attribution to analyze the temporal dis-
In this paper, we propose Holmes2 , a robust and reliable early-
tribution of traffic features, enabling website-adaptive data aug-
stage WF attack that can accurately fingerprint traffic visiting dif-
mentation. Furthermore, Holmes utilizes a supervised contrastive
ferent websites according to a small amount of traffic. Holmes is
learning method to extract correlations between early-stage traf-
capable of effectively identifying the early-stage traffic of websites
fic and complete traffic and obtain the spatial distribution of
under dynamic network conditions and deployed defenses by corre-
websites. By correlating the spatial and temporal distribution,
lating the early-stage traffic with the pre-collected complete traffic.
Holmes achieves a robust and reliable website identification,
We find that both the early-stage traffic and the complete traffic of
which can accurately fingerprint traffic under different network
the same website exhibit a strong connection of temporal-spatial
conditions and various defenses.
distribution because they contain the same website information,
• We prototype Holmes and perform extensive experiments in
e.g., the same parts of the website content and elements. As il-
various settings to demonstrate its performance. We release the
lustrated in Figure 1, Holmes achieves early-stage WF attacks by
source code of Holmes3 .
capturing the correlation between the unknown early-stage traffic
and the pre-collected complete traffic. The rest of this paper is organized as follows: Section 2 presents
To efficiently capture the correlation between the traffic of differ- the background and the problem statement. Section 3 presents the
ent stages of page loading, we design a three-step approach based threat model. In Section 4, we present the key observation and
on temporal-spatial distribution analysis. First, Holmes utilizes an overview of Holmes. Section 5 presents the detailed designs. In
adaptive data augmentation method built on the temporal distribu- Section 6, we evaluate the performances of Holmes. In Section 7, we
tion of traffic features, which augment the traffic at different stages discuss the practicality of Holmes and the possible countermeasure
of page loading. Second, Holmes utilizes supervised contrastive against Holmes. Section 8 and 9 review related works and conclude
learning to transform traffic features into the low-dimensional em- the paper, respectively.
bedding space so that traffic at different loading stages is clustered
closely in the same embedding space. Notably, supervised con- 2 Background & Problem Statement
trastive learning makes the traffic of the same website closer in the 2.1 Background
embedding space by learning the correlations of the traffic. Third, Website fingerprinting (WF) attacks identify the websites visited
Holmes transforms unknown early-stage traffic into a point in the by Tor users by analyzing traffic patterns, such as packet sizes and
2 Holmes is a fictional British detective in novels, known for his skill in analyzing the
correlations of clues to solve problems earlier than others. 3 https://github.com/Xinhao-Deng/Website-Fingerprinting-Library
Robust and Reliable Early-Stage Website Fingerprinting Attacks via Spatial-Temporal Distribution Analysis CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA

20 Adversary
Probability(%)

10 Tor network
10 Page loading
5
00 100 200 300 00 10k 20k 30k Websites
Loading latency (seconds) Number of packets Tor users Defense mechanisms
(a) Loading latency (b) Number of packets
Figure 3: The threat model of Holmes.
Figure 2: Distribution of page load times and number of pack-
ets for Alexa-top 10k websites. Note that, the fixed traffic collection settings required by the
existing attacks further undermine the practicality. For example,
the DF attack sets a traffic collection time of 120 seconds and an
timing information. Previous WF attacks extract fingerprinting input length of 5000. The input is the direction sequence of pack-
features from traffic based on expert knowledge and employ Ma- ets, which is either truncated or zero-padded. However, different
chine Learning (ML) models to classify these features for website websites exhibit significant variations in page loading latency and
identification [18, 32, 43]. However, features extracted based on the number of generated packets, and such fixed settings cannot
expert knowledge can be easily compromised by defenses [23]. guarantee reliable identification of all websites. Figure 2 illustrates
With the advancement of deep learning (DL), DL-based WF attacks the distribution of page loading latency and the number of packets
achieve automated feature extraction and significantly enhance for the Alexa-top 10k websites. We observe that the page loading
performance [3, 36, 39]. DL-based WF attacks can effectively iden- latency of 5.04% of websites exceeds 120 seconds or the packet
tify websites in various real-world scenarios, such as multi-tab count is over 5000, making it difficult for the existing attacks to
browsing [10, 21, 46], under defenses [35, 37], dynamic network collect pure traffic with sufficient website information. Moreover,
environments [2], and concept drift [40]. However, reliance on the we find that over 58.17% of websites require less than 60 seconds
collection of pure traffic throughout the entire page loading hin- or fewer than 2500 packets for page loading. When these websites
ders the real-world deployment of WF attacks. Holmes achieves finish loading, existing attacks continue collecting noise packets,
early-stage WF attacks by utilizing both the temporal and spatial which further degrades the performance of attacks.
distributions of website traffic. To address the issues above and achieve effective WF attacks
Website fingerprinting (WF) defenses aim to undermine the ef- at the early stage of page loading, we develop Holmes to achieve
fectiveness of WF attacks. Existing defenses mainly fall into two the following goals. (i) Reliability. Holmes utilizes traffic collected
categories: disturbing traffic and splitting traffic. The defenses for from the early stages of page loading to achieve high identification
disturbing traffic involve padding dummy packets [16, 23], delay- precision across all websites. (ii) Adaptivity. For traffic from various
ing packets [19, 44], inserting adversarial perturbations [30] and websites, Holmes dynamically performs an attack during each time
obfuscating traffic [31]. However, the significant overhead of de- interval of the traffic collection. Holmes should adaptively stop
fenses may affect the operation of relay nodes [7]. Only a variant traffic collection once enough website information is obtained, and
of the lightweight defense WTF-PAD has been deployed in the Tor accurately identify traffic. (iii) Robustness. Holmes should maintain
network [1]. Traffic splitting defenses involve splitting traffic into robust performance under various WF defenses.
multiple paths so that the adversary can only collect a portion of In a nutshell, Holmes aims to achieve robust and reliable early-
the packets, thereby obscuring the traffic patterns [8]. We evaluate stage WF attacks, effectively identifying each website during the
the robustness of Holmes against existing defenses in Section 6.4. early stages of page loading. Compared to previous attacks, Holmes
may be more practical in the real world, with applications such as
2.2 Problem Statement early detection and prevention of dark web crimes.
The goal of this paper is to develop reliable WF attacks (i.e., accu-
rately identifying all websites) based on the traffic in the early stage 3 Threat Model
of page loading. Previous WF attacks rely on collecting pure traffic This paper aims to develop an early-stage website fingerprinting
throughout the entire page load process. However, under dynamic attack that can identify websites visited by Tor users based on the
network conditions or defenses, existing attacks cannot effectively traffic in the early stages of page loading. In particular, early-stage
collect complete traffic from all websites. Meanwhile, increasing WF attacks can identify websites while the Tor user is still waiting
the traffic collection time incurs more noise from background traf- for the page to fully load. In Figure 3, we show the threat model of
fic or defenses, which further impacts the WF performance. We our early-stage WF attack. Similar with previous works [2, 10, 18, 21,
analyze the SOTA multi-tab attack ARES [10] and the robust attack 32, 35, 36, 39, 40], we consider a local and passive adversary for Tor,
DF [39]. ARES and DF achieve over 90% average precision in the such as network administrators, Internet Service Providers (ISPs),
presence of obfuscated traffic under multi-tab browsing and WTF- and Autonomous Systems (AS). The adversary can only collect
PAD defenses, respectively. We find that the minimum precision of packets without the capability to decrypt packets. Specifically, a
fingerprinting achieved by ARES and DF is only 42.86% and 54.11%, passive adversary is unable to detect the end of a webpage loading,
respectively. and can only configure fixed conditions for traffic collection [10, 36,
CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA Xinhao Deng, Qi Li, and Ke Xu

accurate early-stage traffic fingerprinting by analyzing the correla-


tion between the early-stage traffic and the pre-collected complete
traffic.

4.2 Overview of Holmes


In this paper, we propose Holmes that exploits the correlations be-
tween the early-stage traffic and the pre-collected complete traffic
to achieve early-stage WF attacks. Particularly, Holmes captures
the spatial and temporal distribution of different websites so that it
can accurately fingerprint the traffic according to a small amount
Page loading ratio (%)
of the traffic visiting the websites, even under varied network con-
ditions and WF defenses. Holmes first performs data augmentation
Figure 4: Visualization of temporal distribution based on
based on the unique temporal distribution of traffic features for
feature attribution method SHAP [27].
each website, which generates early-stage traffic that contains suf-
ficient website information. Second, Holmes utilizes Supervised
Contrastive Learning (SCL) [24] to transform traffic features into a
39]. Furthermore, we consider real-world scenarios with defenses. low-dimensional embedding space, where each flow of traffic corre-
On-path Tor relay nodes can be deployed with defenses, such as sponds to a point in the space. SCL extracts the correlation between
padding dummy packets and delaying packets. early-stage and complete traffic of the same website by clustering
Similar to existing attacks [36, 39, 40], we consider closed-world the points of early-stage and complete traffic in the embedding
and open-world scenarios. The closed-world scenario assumes that space. Finally, Holmes projects unknown early-stage traffic into the
Tor users only visit a limited number of websites. Therefore, the embedding space and calculates its correlation with each website
adversary can collect the traffic from all websites in advance in based on the spatial distribution of website traffic in the embedding
the closed-world scenario. In the open-world scenario, clients can space. Note that, to avoid misidentification of early-stage traffic
browse arbitrary websites, and the adversary can only collect traffic containing only connection information, Holmes rejects results of
from a small subset of websites. Therefore, Tor users might browse identifying early-stage traffic with low correlations to all websites.
unmonitored websites unknown to the adversary in the open-world Therefore, Holmes performs attacks at each short time interval
scenario. of traffic collection until the corresponding website is identified
with high confidence, thus enabling adaptive traffic collection and
4 Design of Holmes reliable identification for each website.
Figure 5 illustrates the overview of Holmes. Holmes consists
In this section, we present the key observation for our design and
of three modules designed to construct robust and reliable early-
propose a robust and reliable early-stage WF attack.
stage WF attacks, including adaptive data augmentation, spatial
distribution analysis, and early-stage website identification.
4.1 Key Observation Adaptive Data Augmentation. The adaptive data augmentation
As discussed in Section 2.2, identifying websites by analyzing single module generates early-stage traffic by masking the tail of complete
early-stage traffic is challenging due to dynamic network conditions traffic during the training phase, which ensures that early-stage
and deployed defenses. Particularly, the loaded content during the traffic contains sufficient website information based on the unique
same loading interval varies under different network conditions. temporal distribution of the website. Holmes employs the feature
However, we observe a strong correlation between the early-stage attribution method, i.e., SHAP [27], to analyze the temporal distri-
traffic and the pre-collected complete traffic of the same website, bution of the website traffic. It aggregates the feature attribution
both of which invariably contain the same website information, results of multiple traffic associated with the same websites to ob-
including parts of the website content and elements. tain the feature importance distribution of the website. Holmes
Figure 4 illustrates the distribution of website information across leverages the temporal distribution of websites to apply tail mask-
different stages of page loading, i.e., the temporal distribution of ing of various lengths for the traffic of different websites so that
the website features. For simplicity without losing generality, we it can adaptively generate early-stage traffic containing sufficient
randomly select 20 websites from the Alexa-top 95 websites. We website information for each website. The details of this module
cannot directly analyze the website information corresponding to will be described in Section 5.1.
the encrypted packets. Thus, we measure the importance of traffic Spatial Distribution Analysis. The spatial distribution analysis
features for each page loading stage based on the feature attribution module utilizes supervised contrastive learning to transform traffic
method, i.e., SHAP [27]. The importance of traffic features refers to features and computes the spatial distribution of websites according
their contribution to website identification. The more website in- to the new feature space. To effectively extract the correlation
formation contained in the page loading stage, the more important between early-stage traffic and complete traffic, Holmes utilizes
the corresponding traffic features. We observe that the early-stage an encoder built on supervised contrastive learning to transform
traffic of all websites shares similar sufficient website information traffic features into low-dimensional embedding features, ensuring
with the complete traffic. Therefore, it is possible for us to achieve that the embedding features corresponding to the early-stage and
Robust and Reliable Early-Stage Website Fingerprinting Attacks via Spatial-Temporal Distribution Analysis CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA

1 Adaptive Data Augmentation 2 Spatial Distribution Analysis 3 Early-stage Website Identification


Training traffic Traffic collection timeline

t1 t2 t3


Waiting for
...

next interval
Feature attribution Supervised contrastive learning Attack every time interval
Analyzing the temporal distribution
based on feature attribution Projecting traffic into the
Encoder embedding space

..
Unknown early-
stage traffic
...

Traffic features Embeddings


Masking Embedding space
...

...
...

Embedding space t3
Generating early-stage traffic based on
Radius t1
Attack website-adaptive masking t2
Centroid Reliable identification
Training Only
based on spatial correlation
Analyzing the spatial distribution of
Feature Extraction Identified Websites
website embeddings

Figure 5: The overview of Holmes.

complete traffic of the same website are similar. The embedding traffic may not contain sufficient website information. The reason
features of traffic are viewed as points in the embedding space, is that due to network dynamic conditions and defenses, randomly
where points corresponding to early-stage and complete traffic generated early-stage traffic may only contain connection infor-
with similar embedding features will be clustered together in this mation and dummy packets. Furthermore, differences in website
space. Holmes analyzes the spatial distribution of each website’s loading speed can also affect the correlation between the generated
traffic in the embedding space, calculating the centroid and radius early-stage traffic and the complete traffic. To achieve website-
of each website to support early-stage website identification. We adaptive data augmentation, Holmes utilizes the feature attribution
will describe this module in Section 5.2. method to analyze the temporal distribution of traffic features, en-
Early-Stage Website Identification. The early-stage website iden- suring that the generated early-stage traffic is correlated with the
tification module adaptively collects traffic according to the spatial complete traffic of the same website.
distribution of websites and achieves reliable website identification. Temporal Distribution Analysis. Holmes analyzes the temporal
Since the adversary cannot perceive the page loading progress asso- distribution by profiling the feature importance, which is challeng-
ciated with unknown website traffic, Holmes conducts a WF attack ing for two reasons: (i) Packets are encrypted in multiple layers by
during each traffic collection interval. During each interval, Holmes Tor, making it difficult to analyze their importance. (ii) In dynamic
projects the unknown early-stage traffic into the embedding space network environments or under traffic obfuscation by defenses, the
and then calculates the distance between the point corresponding positions of important packets may change.
to the unknown traffic and the centroid of each website. Since dif- To address these challenges, we extend the feature attribution
ferent websites have unique distribution densities in the embedding method SHapley Additive exPlanations (SHAP) [27] to analyze the
space, i.e. radii, we can obtain the correlation between unknown feature importance distribution at different stages of page loading.
traffic and each website by comparing the distances and radii of SHAP calculates the marginal contribution of each feature by gen-
websites. If the distance between the centroid of a website and erating combinations of all features. It is based on Shapley values, a
the unknown traffic is less than the radius of the website, the traf- concept from cooperative game theory, which ensures a fair distri-
fic is successfully identified and traffic collection ends. Otherwise, bution of the contribution among the features. SHAP provides local
Holmes will continue collecting traffic and analyze the traffic at explanations showing how much each feature in a specific instance
the next time interval. We will present the details of early-stage contributes to the model’s output, as well as global insights about
website identification in Section 5.3. the overall model behavior.
The advantages of SHAP over other feature attribution meth-
5 Design Details ods include (i) Accuracy. SHAP calculates all feature combinations,
which enables effective analysis of the relationships among fea-
In this section, we present the design details of Holmes, including tures in encrypted traffic, resulting in more accurate attribution
the adaptive data augmentation module, the spatial distribution outcomes. (ii) Consistency. SHAP provides consistent feature attri-
analysis module, and the early-stage website identification module. bution results for multiple traffic to the same website. Therefore,
Holmes can aggregate the feature attribution results of multiple
5.1 Adaptive Data Augmentation traffic to obtain a website-level distribution of feature importance.
The Adaptive Data Augmentation module generates traffic at dif- Let 𝑈 = {𝑓1, 𝑓2, . . . , 𝑓𝑛 } represent the feature set of traffic, where
ferent stages of page loading based on masked tail traffic, thereby 𝑛 is the number of features. Holmes divides the page loading time
facilitating the analysis of the correlation between early-stage traf- into 𝑛 equal time intervals and counts the number of incoming
fic and complete traffic. However, randomly generated early-stage and outgoing packets in each interval as traffic features, where 𝑓𝑖
CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA Xinhao Deng, Qi Li, and Ke Xu

2D convolution block 1D convolution block

2D convolution

1D convolution
2D convolution

1D convolution

Adaptive pooling
𝜆

2D pooling

1D pooling
Generate
Generatemask
mask

2D BN

1D BN
2D BN

1D BN
CDF

Randomly
Randomly

..
𝜇 sampling
sampling ...
Loading
Loading Complete
Completetraffic
traffic
ratio
ratio Traffic features Embeddings
Mask
Mask the
thetail
tail Residual ×2 Residual ×4

Generated
Figure 7: The Encoder of Holmes.
0% 100%
100% Generated traffic
traffic
(a) Calculation of effective loading (b) Generation of early-stage traffic.
ranges.
In Figure 6, we show the details of the data augmentation. Holmes
initially calculates the effective loading ranges of websites. When
Figure 6: Adaptive data augmentation of Holmes. (a) Holmes the page loading ratio of a website reaches the effective loading
calculates the effective loading ranges of websites based on range, the early-stage traffic contains enough website information
the temporal distribution of websites. (b) Holmes randomly to be correlated with the complete traffic. As shown in Figure 6(a),
samples the start of the mask based on the effective loading Holmes generates the cumulative distribution of feature impor-
ranges of websites and generates early-stage traffic by mask- tance for all websites. Holmes sets two parameters, 𝜆 and 𝜇, rep-
ing traffic tails. resenting the upper and lower bounds of the cumulative feature
importance corresponding to the effective loading proportions of
websites. Based on the parameters 𝜆 and 𝜇, Holmes can calculate
represents the feature of the i-th interval. Holmes calculates the the effective loading range for each website.
importance of the i-th feature 𝑓𝑖 based on the difference in the To ensure the correlation between the generated early-stage
expected model output when conditioning on the feature 𝑓𝑖 . To traffic and the complete traffic, Holmes adaptively enhances the
miner the dependencies among traffic features, Holmes generates traffic for each website, making the generated traffic originate
all feature combinations excluding the feature 𝑓𝑖 to calculate the from the effective loading range of the corresponding website. Let
marginal contribution of the feature 𝑓𝑖 . Specifically, the importance 𝑅 = {(𝑠 1, 𝑡 1 ), (𝑠 2, 𝑡 2 ), . . . , (𝑠𝑚 , 𝑡𝑚 )} represent the effective loading
of the i-th feature 𝜙𝑖 can be computed as follows: ranges for 𝑚 monitored websites, where the effective loading range
for the i-th website is from 𝑠𝑖 to 𝑡𝑖 . In Figure 6(b), we show the
details of early-stage traffic generation. For the traffic of the i-th
∑︁ |𝑆 |! · (𝑛 − |𝑆 | − 1)! website, Holmes randomly samples an integer 𝒍 from 𝑠𝑖 to 𝑡𝑖 , then
𝜙𝑖 = · (O(𝑆 ∪ {𝑓𝑖 }) − O(𝑆)), (1)
𝑛! masks the tail of traffic from the loading ratio 𝒍 to the entire page
𝑆 ⊆𝑈 \{ 𝑓𝑖 }
loading. We select the starting point of the mask randomly within
where 𝑆 is a feature subset excluding 𝑓𝑖 . O(𝑆 ∪ {𝑓𝑖 }) and O(𝑆) an effective range, ensuring that the generated traffic belongs to
represent the expected outputs of the model when feature 𝑓𝑖 is the early stages of page loading and contains adequate website
present and absent, respectively. The weight of the set 𝑆 is the information.
frequency of occurrence among all possible feature combinations.
Subsets of varying sizes are balanced in terms of weight to ensure 𝒍 ∼ Uniform[𝑠𝑖 , 𝑡𝑖 ]. (2)
that the contributions of each feature can be fairly assessed. Due to
the high computational cost of Equation 1, we employ the DeepLIFT Holmes performs data augmentation on each traffic 𝛼 times. The
algorithm [38] for approximation to expedite the calculation. higher the value of 𝛼, the more early-stage traffic is generated.
We select the SOTA WF attack RF [37] as the target model for However, excessive generation of early-stage traffic can lead to
feature profiling. For each website, we randomly select 10 traffic. significant time overhead of model training.
We calculate the importance of features corresponding to different
loading stages of the website and represent the temporal distribu- 5.2 Spatial Distribution Analysis
tion of each website using the average temporal distribution of the Utilizing the early-stage traffic generated by the temporal distri-
traffic. bution analysis module, the spatial distribution analysis module
Mask-based Data Augmentation. Data augmentation is a ma- extracts the correlation between early-stage and complete traf-
chine learning technique that enhances the diversity of training fic. Specifically, Holmes builds an Encoder based on Supervised
data by artificially modifying samples to improve model perfor- Contrastive Learning (SCL) [24] to extract common features of
mance [2]. Holmes achieves the data augmentation by masking early-stage and complete traffic, generating low-dimensional em-
the tail of the traffic. However, the setting of mask proportion is beddings that are spatially proximate. Then Holmes analyzes the
challenging. A prolonged mask results in early-stage traffic lack- spatial distribution of websites using the Median Absolute Devia-
ing information related to the website, whereas a too-brief mask tion (MAD) [25].
requires the adversary to spend a lot of time collecting enough Traffic Embedding Based on SCL. To address the challenges
packets. To address the above challenges, Holmes employs website- posed by network jitter and defenses in the real world on the anal-
adaptive data augmentation based on the temporal distribution of ysis of early-stage traffic, Holmes employs Supervised Contrastive
websites. Learning (SCL) for traffic embedding. The generated embeddings
Robust and Reliable Early-Stage Website Fingerprinting Attacks via Spatial-Temporal Distribution Analysis CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA

encompass robust features of the traffic, enabling traffic from dif- Algorithm 1: Website Profiling
ferent loading stages of the same website to aggregate in the em- Input:
bedding space. Note that, Holmes addresses the limitation of the 𝑊 : all websites.
clustering methods, i.e., they cannot effectively aggregate original 𝑧: the embeddings of all websites.
high-dimensional features due to the “curse of dimensionality” [48]. Output:
Holmes initially extracts raw features from traffic, serving as the 𝑐: the centroids of all websites.
input for generating embeddings. We use the Traffic Aggregation 𝑟 : the radii of all websites.
Features (TAF) as the raw features. TAF is an extension of the 1 for 𝑤 ∈ 𝑊 do
2 𝑐 𝑤 = Mean(𝑧 𝑤 ) ⊲ Calculate the centroid of website 𝑤
Traffic Aggregation Matrix (TAM) [37] that effectively represents
3 for 𝑧𝑖𝑤 ∈ 𝑧 𝑤 do
aggregated traffic information. We set up 𝜌 non-overlapping time
4 𝑑𝑖𝑤 = 1 − cosine_similarity(𝑐 𝑤 , 𝑧𝑖𝑤 )
windows of equal length. The length of the time window is 𝜃 . For the
5 end
i-th time window, we calculate three types of aggregated features:
6 𝑀 𝑤 = Median(𝑑 𝑤 ) ⊲ Calculate the median
(i) the number of incoming and outgoing packets. (ii) the number of
7 𝑟 𝑤 = Median{ |𝑑𝑖𝑤 − 𝑀 𝑤 | } ⊲ Calculate the radius
incoming and outgoing bursts. (iii) the average size of incoming and
8 end
outgoing bursts. Therefore, we can aggregate statistical information
9 for 𝑤𝑖 , 𝑤 𝑗 ∈ 𝑊 do
from multiple time windows as the initial feature of the traffic. 10 𝑑 = 1 − cosine_similarity(𝑐𝑖 , 𝑐 𝑗 )
We use the Convolutional Neural Network (CNN) as the En- 11 if 𝑟𝑖 + 𝑟 𝑗 ≥ 𝑑 then
coder network for traffic embedding. CNN is applied by previous 12 ⊲ Tuning the radius
attacks [3, 35–37, 39, 40] and proved to be effective in extracting key 13 𝑟𝑖 = 𝑟𝑖 − 𝑟𝑖𝑟+𝑟
𝑖
· (𝑟 𝑖 + 𝑟 𝑗 − 𝑑 )
patterns of traffic associated with the website. Let Enc(·) denote 𝑟𝑗
𝑗
14 𝑟𝑗 = 𝑟𝑗 − 𝑟𝑖 +𝑟 𝑗 · (𝑟𝑖 + 𝑟 𝑗 − 𝑑 )
the encoder network, and we can obtain the embedding 𝒛 of the
traffic with the raw feature 𝒙 based on the Encoder. 15 end
16 end
17 return c, r
𝒛 = Enc(𝒙). (3)
We show the details of the Encoder in Figure 7. To effectively ex-
tract the correlation of the website traffic at different loading stages, anchor’s corresponding website as positive samples and traffic of
we utilize convolution with a greater number of channels and a other websites as negative samples. Holmes repeats this process
deeper network architecture compared to previous attacks [37, 39]. multiple times to ensure that the selected anchors include multiple
Since the input is two-dimensional features, Holmes uses two 2D traffic for all websites.
convolution blocks to extract high-dimensional information. Then SCL can learn various correlations between the anchor and the
Holmes fusions information of packets with different directions positive samples, ensuring that in the generated embedding space,
through the 2D pooling layer, transforming the two-dimensional the distance between the anchor and positive samples is close,
features into one-dimensional features. Subsequently, four 1D con- while the distance between the anchor and negative samples is far.
volution blocks are utilized to extract traffic patterns related to Formally, for the i-th traffic 𝒙 𝑖 with embedding 𝒛𝑖 , we can calculate
the website. Finally, Holmes employs an adaptive pooling layer to its loss by SCL:
generate embeddings of traffic.
Furthermore, we employ two complementary methods. First, 1 ∑︁ exp(z𝑖 · z𝑝 /𝛾)
residual connections are utilized, which involve transmitting inter- L𝑖 = − log Í , (4)
|P(𝑖)| 𝑛∈N(𝑖 ) exp(z𝑖 · z𝑛 /𝛾)
mediate outputs from lower to higher layers via skip connections, 𝑝 ∈P(𝑖 )
thereby reducing the issue of gradient vanishing. Second, multiple where P(𝑖), N(𝑖) are the set of the index of all positive samples and
dropout layers are used, where a subset of units, including their negative samples of the i-th traffic, respectively. For the embedding
associated connections, are randomly omitted from the network of anchor 𝑧𝑖 , we calculate the similarity with each positive sample
during the training, thus mitigating overfitting. embedding 𝑧𝑝 and compare it with similarities between the anchor
The performance of the Encoder depends on effective model and all negative samples. In particular, 𝛾 is temperature, a hyper-
training. The Encoder aims to extract various correlations in traffic, parameter that controls the distance of traffic 𝒙 𝑖 from the most
including (i) The correlation between the traffic at different loading similar negative sample. The smaller the temperature 𝛾, the greater
stages of the same website. (ii) The correlation between the traffic of the differentiation from the negative samples, but it tends to affect
the same website where the traffic patterns change due to network the similarity to the positive samples. Through Equation 4, we can
dynamics or defenses. Contrastive learning and metric learning can effectively train the Encoder and extract the correlations of website
learn the correlation between samples. However, both contrastive traffic at different loading stages.
learning and metric learning consider only one type of correlation Spatial Distribution Based on MAD. Holmes aims to achieve
that exists in the samples. To effectively extract multiple types reliable early-stage website identification. However, the early-stage
of correlations existing in the samples, Holmes applies SCL to traffic contains little website information and is prone to misidenti-
train the Encoder. SCL combines the advantages of supervised fication under the interference of network dynamics and defenses.
and contrastive learning. Specifically, Holmes randomly selects Holmes addresses the challenge by utilizing the spatial distribu-
one traffic as the anchor. Then, Holmes selects all traffic of the tion of website traffic. Traffic from different websites has different
CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA Xinhao Deng, Qi Li, and Ke Xu

positions and levels of tightness in the embedding space. Holmes Algorithm 2: Early-stage Website Identification
calculates the centroid and radius for each website, representing the Input:
position and level of tightness of the website traffic, respectively. By 𝜏: the time interval.
leveraging the centroid and radius information of websites, Holmes 𝜎: the maximum traffic collection time.
can reject low-confidence identifications of unknown traffic. We 𝑊 : all monitored websites.
will detail how to utilize the centroid and radius of websites for 𝑐: the centroids of all monitored websites.
reliable early-stage website identification in Section 5.3. 𝑟 : the radii of all monitored websites.
In Algorithm 1, we show the pseudocode for website profiling. ˆ the unmonitored website.
𝑤:
𝜖: threshold for concept drift detection.
Suppose there are 𝑚 websites. Holmes sequentially calculates the
Output:
centroid and radius for each website. For the website 𝑤, Holmes res: the identification result.
generates the embeddings for all traffic of the website 𝑤. Let 𝑧𝑖𝑤 1 res = 𝑤ˆ
represent the embedding of the i-th traffic of the website 𝑤. Holmes 2 count = 0
calculates the centroid of website 𝑤 by averaging all embeddings 3 while True do
across each dimension (line 2). Then Holmes calculates the distance 4 time.sleep(𝜏 ) ⊲ Wait time interval 𝜏
between each traffic embedding and the centroid of the website 5 count = count + 𝜏
𝑤 using cosine similarity (lines 3-4). We select cosine similarity 6 x = getTraffic( ) ⊲ Get the current collected traffic
because matrix operations can accelerate multiple cosine similarity 7 z = Encoder(x)
calculations. Finally, we use a distribution estimation algorithm, 8 for 𝑤 ∈ 𝑊 do
Mean Absolute Deviation (MAD) [25] to generate the radius for 9 𝑑 = 1 − cosine_similarity(𝑐 𝑤 , z)
the website 𝑤 (lines 6-7). MAD calculates the median of absolute 10 if 𝑑 ≤ 𝑟 𝑤 then
deviations, where absolute deviation refers to the absolute value of 11 res = 𝑤 ⊲ Identification success
the difference between each data and the median of all data. 12 break
Based on the centroid and radius of each website, the spatial 13 end
distribution of each website in the embedding space forms a sphere. 14 end
Holmes utilizes supervised contrastive learning to separate the 15 if (res ≠ 𝑤)ˆ or (count > 𝜎) then
centroids of different websites in the embedding space. However, 16 break ⊲ Exit identification
we observe a 0.01% probability of overlap between the spheres 17 end
corresponding to the two websites in our study. This occurs because 18 end
the centroids of websites with similar types or content are closer 19 if res == 𝑤ˆ then
to each other. Therefore, Holmes further examines the distances 20 𝑑 𝑚𝑖𝑛 = 𝜖
between the centroids of different websites and their corresponding 21 for 𝑤 ∈ 𝑊 do
radii. For two websites 𝑤𝑖 and 𝑤 𝑗 , if the distance between c𝑖 and c 𝑗 22 𝑑 = 1 − cosine_similarity(𝑐 𝑤 , z)
is less than the sum of the radii, we proportionally reduce the radii 23 if 𝑑 − 𝑟 𝑤 < 𝑑 𝑚𝑖𝑛 then
of website w𝑖 and website w 𝑗 . Finally, the spheres corresponding to 24 𝑑 𝑚𝑖𝑛 = 𝑑 − 𝑟 𝑤
each website in the embedding space are non-overlapping, which 25 res = 𝑤
facilitates the early-stage website identification of Holmes. 26 end
27 end
28 end
29 return res
5.3 Early-Stage Website Identification
The early-stage website identification module leverages the correla-
tions between different loading stages of website traffic to achieve centroids of all monitored websites (lines 8-9). If the distance be-
robust and reliable identification of early-stage traffic. To achieve tween the unknown traffic and a website’s centroid is less than
early-stage website identification, Holmes attempts website identifi- the radius of the website, Holmes successfully identifies the traffic
cation at each fixed time interval. The challenge faced by Holmes is (lines 10-12). Otherwise, Holmes continues to collect traffic and
ensuring high confidence in website identification to avoid misiden- waits for the next time interval.
tification of early-stage traffic. To address the above challenge, However, not all early-stage traffic can be guaranteed to be iden-
Holmes calculates the correlation between unknown traffic and tified. Changes in the content of monitored websites can lead to vari-
monitored websites based on the position of unknown traffic in the ations in traffic patterns (i.e., concept drift). Furthermore, Holmes
feature space and the spatial distribution of monitored websites. is unable to detect early-stage traffic from unmonitored websites.
Holmes rejects the identification of early-stage traffic with low Holmes sets a maximum traffic collection time 𝜎. After collecting
correlation to all monitored websites and continues to collect more traffic for 𝜎 seconds, Holmes will detect whether the unknown traf-
packets. fic is due to concept drift or originates from unmonitored websites
In Algorithm 2, we show the pseudocode for early-stage web- (lines 19-28). A key insight is that the distance between a website’s
site identification. At every time interval, Holmes first projects the concept drift traffic and its centroid should be slightly greater than
unknown early-stage traffic into the embedded space (lines 6-7) the website’s radius, yet much smaller than the distance between
and calculates the distance between the unknown traffic and the unmonitored website traffic and the website’s centroid. Therefore,
Robust and Reliable Early-Stage Website Fingerprinting Attacks via Spatial-Temporal Distribution Analysis CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA

Table 1: Parameter settings in our evaluation onion services [41]. The dataset includes various types of web-
sites, comprising black markets, social networks, and financial
Group parameters Value services. These websites use onion services to anonymize servers,
requiring more relay nodes and resulting in greater loading la-
Lower bound of CDF 𝜇 0.3
Data Augmentation Upper bound of CDF 𝜆 0.6 tency. We utilized 20 servers deployed across three countries to
Number of augmentation 𝛼 2 collect traffic in August 2023 and April 2024. Note that our data
collection did not negatively impact the real-world Tor network.
Number of time windows 𝜌 2000
We only collect traffic from browsing sessions we initiated locally,
Length of time windows 𝜃 80 ms
Spatial Analysis
Embedding size 𝜂 128
ensuring our dataset does not include data from other Tor clients.
Temperature 𝛾 0.1 • Dataset with WTF-PAD defense: The WTF-PAD defense [23]
disrupts traffic patterns by adaptively padding dummy packets
Time interval 𝜏 120 ms
without delaying any packets. The variation of WTF-PAD defense
Website Identification Maximum collection time 𝜎 80 s
based on circuit-level padding has been deployed in Tor [1].
Threshold for concept drift 𝜖 0.01
• Dataset with Front defense: The Front defense [16] utilizes the
Rayleigh distribution to generate the padding times for dummy
we set a predefined threshold 𝜖. If the difference between the dis- packets. Similar to the WTF-PAD defense, the time overhead for
tance of the unknown traffic from the website’s centroid 𝑑 and the Front defense is zero.
the website’s radius 𝑟 𝑤 is less than 𝜖, then the unknown traffic • Dataset with Walkie-Talkie defense: Walkie-Talkie [44] em-
is identified as a concept drift sample of the website (lines 21-25). ploys a half-duplex communication model and merges original
Furthermore, we define a variable 𝑑𝑚𝑖𝑛 to represent the smallest traffic with traffic from randomly selected decoy pages to mislead
difference between 𝑑 and 𝑟 𝑤 among all websites, with 𝑑𝑚𝑖𝑛 initially WF attacks. This defense introduces a mild bandwidth and time
set to the threshold 𝜖 (line 20). If the unknown traffic meets the overhead.
concept drift detection criteria for multiple monitored websites, • Dataset with TrafficSliver defense: The TrafficSliver defense [8]
we identify the traffic as the website with the highest correlation, employs a traffic-splitting mechanism that restricts the adversary
which is the website corresponding to 𝑑𝑚𝑖𝑛 . In particular, we set to collecting only partial packets. We generate the dataset by
the threshold for concept drift detection 𝜖 to infinity in the closed- splitting the traffic into three paths based on the script provided
world scenario. The reason is that in the closed-world scenario, by the authors.
Tor users only visit monitored websites, eliminating the need to
identify traffic from unmonitored websites. WF defenses have been extensively studied [4, 5, 8, 13, 16, 17,
23, 44], yet some defenses are not practically deployable due to
6 Performance Evaluation the significant overhead [29]. The latency introduced by defenses
In this section, we evaluate Holmes with public datasets and real- may cause out-of-memory errors in Tor relay nodes. Therefore,
world datasets. We compare the performance of Holmes with the following previous attacks [35, 37, 39], we select four representa-
state-of-the-art WF attacks. tive defense methods for evaluation: WTF-PAD [23], Front [16],
TrafficSliver [8], and Walkie-Talkie [44].
6.1 Experimental Setup Baselines. We select 9 state-of-the-art WF attacks as our baselines.
Implementation. We prototype Holmes using PyTorch 2.0.1 and
Python 3.8 with more than 1,400 lines of code. In particular, we use • AWF: AWF [36] utilizes CNNs to automatically extract features
a single NVIDIA GeForce RTX 4090 GPU for our experiments. We from packet direction sequences for website identification.
show the default parameter values in Table 1. Furthermore, we split • DF: DF [39] proposes more sophisticated CNNs compared to
the dataset into training, validation, and testing, with an 8:1:1 ratio. AWF that can effectively undermine WTF-PAD defense.
The parameter tuning and spatial-temporal analysis are performed • Tik-Tok: Tik-Tok [35] utilizes both direction and timestamp in-
on the validation dataset to avoid leakage of the testing dataset. formation of packets, which can effectively improve attack per-
Dataset. Our datasets comprise six categories of data, including a formance under defense.
dataset of Alexa-top websites, a dataset of dark web websites, and • Var-CNN: Var-CNN [3] designs a more powerful model based on
four types of defended datasets. ResNets, which utilizes mechanisms such as dilated convolution
• Dataset of Alexa-top websites: This dataset is from [39], which to improve attack performance.
includes data from both closed-world and open-world scenarios. • TF: TF [40] extends the DF model using Triplet networks to
The closed-world data comprises 95 monitored websites, each achieve the best performance in scenarios with fewer training
with over 1000 traces. In the open-world scenario, there are over instances.
40,000 unmonitored websites, each with only one trace. All web- • RF: RF [37] extracts a two-dimensional matrix feature named
sites belong to the Alexa-top websites list, which ranks websites TAM, which has better robustness against defenses.
based on popularity. • NetCLR: NetCLR [2] integrates data augmentation and self-
• Dataset of dark web websites: Since Alexa-top does not repre- supervised learning. It introduces three data augmentation meth-
sent the popularity of visits by Tor users, we select 80 of the most ods for traffic bursts to improve the effectiveness of WF attacks
popular dark web websites based on the measurement of Tor v3 in dynamic network environments.
CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA Xinhao Deng, Qi Li, and Ke Xu

• ARES: ARES [10] is a robust multi-tab WF attack that integrates Holmes Var-CNN NetCLR Tik-tok AWF
multiple Transformer-based classifiers to identify websites within RF ARES DF TMWF TF
obfuscated traffic. ARES also supports single-tab WF attacks. 100
• TMWF: TMWF [21] applies DETR [6], a Transformer-based ob-
80

Accuracy (%)
ject detection framework, to achieve multi-tab WF attacks. We
set the number of tab queries to 1 to apply TMWF to single-tab 60
WF attacks.
To reduce the time overhead of the experiments, the parameters
40
of baselines are all set to their default values. Note that the baselines 20
may achieve better performance with parameter tuning.
Metrics. We select 4 metrics that are widely used to evaluate the
0 20 30 40 50 60 70 80 90 100
performance of WF attacks, i.e., Accuracy, Precision, Recall, and Page loading ratio (%)
F1-score. We calculate the macro average of all websites. Specifi-
cally, we can calculate the numbers of true positive instances (TP), Figure 8: Comparison of WF attacks at different loading
false positive instances (FP), true negative instances (TN), and false stages of websites in the closed-world scenario.
negative instances (FN) for each website, respectively. These four
metrics can be calculated as:
TP + TN
Accuracy = . (5) Holmes Var-CNN NetCLR Tik-tok AWF
TP + TN + FP + FN RF ARES DF TMWF TF
Precision =
TP
. (6) 100
TP + FP
80
r-precision (%)
TP
Recall =
TP + FN
. (7) 60
2 × Precision × Recall 40
F1 − score = . (8)
Precision + Recall
The differences in website types and content lead to variations in
20
traffic patterns, and the average precision may obscure the low iden- 0 30 40 50 60
tification precision of some websites. Therefore, we use P@min to
Page loading ratio (%)
represent the lowest precision across all websites. We can evaluate
the reliability of WF attacks by calculating P@min. Furthermore, the
Figure 9: Comparison of the r-precision of WF attacks for
base rate fallacy [22] can lead to an overestimation of the precision
early-stage traffic in the open-world scenario.
in the open-world setting. Following previous attacks [42], we use
r-precision for open-world evaluation. Specifically, r-precision
assumes that the frequency of visits to unmonitored websites is
𝑟 times that of monitored websites, hence the sample weight of
unmonitored websites is 𝑟 times that of monitored websites when all loading stages of websites. The primary reason is that Holmes
calculating precision. We set 𝑟 to 20 in our experiments. extracts traffic correlations at different loading stages of websites
through spatial-temporal analysis. This correlation enhances the
6.2 Closed-World Evaluation ability of Holmes to identify traffic across all loading stages of
We first evaluate the performance of Holmes in the closed-world websites.
scenario using the dataset of Alexa-top 95 websites. To assess the We further evaluate the Precision, Recall, and F1-score of Holmes
performance of Holmes in identifying early-stage website traffic, in identifying early-stage traffic. Table 2 presents a comparison of
we generate traffic for different loading stages of websites based Holmes with existing WF attacks. Holmes significantly outperforms
on packet timestamps from the testing dataset. As shown in Fig- other attacks in all stages of page loading. For instance, when web-
ure 8, Holmes achieves optimal attack performance under different sites are loaded to 20%, 30%, 40%, 50%, and 60%, the F1-score of
page loading ratios. As the loading progress of websites increases Holmes shows an average increase of 330.43%, 245.52%, 151.51%,
from 20% to full completion, the Accuracy of Holmes in identify- 79.59%, and 38.85% over existing attacks, respectively. For early-
ing the website gradually improves, rising from 50.94% to 98.36%. stage traffic, we observe that Holmes exhibits higher Precision
Compared to existing attacks, Holmes demonstrates a significant ad- than Recall. This indicates that Holmes is effective in avoiding the
vantage in early-stage traffic analysis. For example, when websites misidentification of traffic with insufficient website information.
are 40% loaded, Holmes achieves an Accuracy of 90.65%, which Benefiting from the temporal distribution analysis of website fea-
represents an improvement of 26.84%, 90.68%, 109.50%, 140.32%, tures and website-adaptive data augmentation, Holmes is capable
175.36%, 194.22%, 224.68%, 235.12%, and 323.60% over RF, Var-CNN, of effectively identifying early-stage traffic that contains sufficient
ARES, NetCLR, DF, Tik-tok, TMWF, AWF, and TF, respectively. website information while avoiding misidentification of early-stage
Specifically, Holmes exhibits the highest Accuracy for traffic at traffic without adequate website information.
Robust and Reliable Early-Stage Website Fingerprinting Attacks via Spatial-Temporal Distribution Analysis CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA

Table 2: Comparisons with prior arts with the early-stage traffic in the closed-world scenario, where P, R, F1 represent Precision
(%), Recall (%), and F1-score (%).

20% loaded 30% loaded 40% loaded 50% loaded 60% loaded
Attacks
P R F1 P R F1 P R F1 P R F1 P R F1
TF 24.74 7.48 8.50 30.25 12.63 14.15 36.55 21.44 23.31 48.14 35.79 37.76 61.72 55.23 56.20
AWF 28.39 9.97 11.75 33.40 17.26 19.22 41.12 27.06 29.14 51.16 42.15 43.61 63.63 59.71 59.93
TMWF 25.77 7.87 8.27 31.70 15.19 16.53 41.15 27.95 29.79 56.04 46.42 47.78 71.84 66.44 67.02
Tik-tok 37.91 10.83 11.50 40.54 18.56 20.08 47.00 30.82 32.89 59.03 47.71 49.19 71.46 65.84 66.60
DF 35.28 11.19 12.62 40.43 19.00 21.38 50.85 32.96 35.08 61.52 50.17 51.68 72.76 67.54 68.03
NetCLR 32.39 10.32 11.85 41.07 20.67 23.40 54.19 37.72 39.96 65.71 56.17 57.47 77.01 73.72 73.69
ARES 43.06 13.30 15.66 51.31 25.43 28.70 60.36 43.28 45.86 69.77 61.80 62.71 77.96 74.73 74.57
Var-CNN 49.66 15.28 18.29 57.66 29.12 32.85 65.49 47.52 50.39 74.21 65.98 67.22 81.64 78.51 78.69
RF 55.51 27.44 31.27 67.32 50.55 53.17 78.23 71.43 72.43 86.22 83.70 84.06 91.34 90.34 90.41
Holmes 66.79 50.92 53.45 80.22 76.85 76.48 91.14 90.64 90.48 95.19 95.01 95.00 96.40 96.24 96.23

6.3 Open-World Evaluation 82.03%, while the accuracy of all baselines is below 45%. For early-
We further evaluate the realistic open-world scenario using the stage traffic when websites are 50% loaded, Holmes achieves an
dataset of Alexa-top websites, including 95 monitored websites accuracy of 89.45%, marking significant improvements over RF, Var-
and 40,000 unmonitored websites. The number of unmonitored CNN, ARES, NetCLR, DF, Tik-tok, TMWF, AWF, and TF by 46.95%,
websites significantly exceeds the number of monitored websites. To 88.04%, 138.98%, 284.73%, 123.23%, 115.65%, 191.18%, 539.84%, and
effectively assess attack performance in the open-world setting, we 436.59%, respectively. Similar to Holmes, NetCLR and TF generate
follow previous works [42] by utilizing r-precision for evaluation. embeddings of traffic features based on contrastive learning and
Figure 9 shows the comparison of r-precision for WF attacks metric learning, respectively. However, the accuracies of NetCLR
when the ratio of page loading ranges from 30% to 60%. Holmes and TF for early-stage traffic with WTF-PAD defense are both below
consistently achieves high r-precision across different page load- 30%. The advantage of Holmes is attributed to its feature extraction
ing ratios. Compared to existing attacks, Holmes demonstrates a and SCL-based traffic embedding, which enable robust website
significant advantage in identifying early-stage traffic in the open- identification under defenses.
world scenario. For example, when the ratio of page loading is 40%, Front is a more powerful padding-based defense compared to
Holmes achieves the r-precision of 94.96%, while the F1-scores WTF-PAD. By padding dummy packets at the front of the traffic,
for RF, Var-CNN, ARES, NetCLR, DF, Tik-tok, TMWF, AWF, and TF Front significantly impacts the identification of early-stage traf-
are 83.77%, 71.72%, 56.99%, 51.49%, 49.33%, 51.26%, 38.46%, 31.31%, fic. Figure 10(b) shows the evaluation of WF attacks under Front
and 26.06%, respectively. When websites are loaded to 30%, 40%, defense. Holmes achieves the best accuracy across all page load-
50%, and 60%, the r-precision of Holmes shows an average in- ing ratios. When the page loading ratio is 30%, 40%, 50%, and 60%,
crease of 130.61%, 109.91%, 79.00%, and 57.62% over existing attacks, Holmes improves the accuracy of baselines by 561.40%, 480.92%,
respectively. 316.03%, and 192.97% on average, respectively. Existing WF attacks
The experimental results demonstrate that Holmes can effec- rely on the complete features of individual traffic, whereas Holmes
tively distinguish between early-stage traffic from monitored and leverages the correlation between early-stage traffic and complete
unmonitored websites in the open-world scenario. Particularly, traffic of the same website to achieve a more robust WF attack.
Holmes reduces training overhead compared to baselines by elim- In Figure 10(c), we show the accuracy of WF attacks under the
inating the requirement for training samples from unmonitored Walkie-Talkie defense. We find that the attack performance of RF
websites. Holmes leverages the spatial distribution of monitored is close to that of Holmes. The reason is that traffic aggregation
websites in the feature space. By comparing the distance of un- information based on time windows has been proven to effectively
known traffic in the feature space to the centroid of the website undermine the Walkie-Talkie defense [37]. Holmes still holds an
and the website’s radius, Holmes achieves early-stage WF attacks advantage in identifying early-stage traffic. For instance, at a page
with high precision in the open-world scenario. loading ratio of 30%, Holmes achieves an accuracy of 87.04%, while
the accuracy of RF, Var-CNN, ARES, NetCLR, DF, Tik-tok, TMWF,
AWF, and TF are 83.70%, 61.80%, 38.07%, 21.93%, 30.24%, 26.47%,
6.4 Robustness Evaluation
16.93%, 8.39%, and 6.56%, respectively.
Next, we evaluate the robustness of Holmes using datasets of Alexa- TrafficSliver is a potent defense that combats WF attacks by
top 95 websites with four defenses. In Figure 10, we demonstrate splitting traffic. Figure 10(d) shows the comparison of WF attacks
the accuracy of WF attacks in different loading stages of websites under TrafficSliver defense. We observe a significant decrease in
under defenses. the accuracy of baselines under TrafficSliver defense, while Holmes
As shown in Figure 10(a), for the WTF-PAD defense, Holmes maintains its robustness. When the page loading ratios are 30%, 40%,
achieves the best accuracy across all ratios of page loading. For 50%, and 60%, the accuracy of Holmes is improved by an average
early-stage traffic, Holmes is more robust compared to other attacks. of 711.28%, 593.93%, 417.82%, and 283.98% compared to other WF
When the page loading ratio is 40%, Holmes achieves an accuracy of
CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA Xinhao Deng, Qi Li, and Ke Xu

Holmes RF Var-CNN ARES NetCLR DF Tik-tok TMWF AWF TF


100 100 100
75 75 75
75
Accuracy (%)

50 50 50 50
25 25 25 25
0 30 40 50 60 0 30 40 50 60 0 30 40 50 60 0 30 40 50 60
Page loading ratio (%) Page loading ratio (%) Page loading ratio (%) Page loading ratio (%)
(a) WTF-PAD (b) Front (c) Walkie-Talkie (d) TrafficSliver

Figure 10: Evaluating robustness of WF attacks for early-stage traffic with four defenses.

Holmes Var-CNN NetCLR Tik-tok AWF Table 3: Comparison with existing attacks using the dataset
RF ARES DF TMWF TF of dark web websites in real-world evaluation.
100
77 82 a
80
Attacks Latency Loading ratio Precision
70 6670 64
60
P@min (%)

TF 162.44 s 73.67% 17.14


60 54 54
46 AWF 100.91 s 47.15% 10.18
40 3531
29322326
TMWF 236.58 s 97.58% 47.21

16 1514 18 Tik-tok 162.21 s 73.63% 63.19


20 6 10 DF 162.21 s 73.63% 33.68
00000000
0
NetCLR 162.20 s 73.61% 28.45
60 80 100 ARES 236.58 s 97.58% 52.09
Page loading ratio (%) Var-CNN
RF
162.21 s
162.21 s
73.63%
73.63%
67.31
84.99
RF30% 52.44 s 25.07% 83.70
Figure 11: Reliability evaluation of WF attacks under WTF- Holmes 45.25 s 21.71% 85.19
PAD defense, where P@min is the minimum of identification
a indicates lower is better and indicates higher is better.
Precision for all websites.

attacks. TrafficSliver is effective in reducing the amount of website attacks, ARES and TMWF, fail to ensure reliable identification un-
information in the early-stage traffic. However, TrafficSliver cannot der obfuscated traffic. Existing attacks focus only on high average
disrupt the correlation between traffic from different stages of page precision, ignoring the low P@min caused by differences between
loading. Therefore, Holmes is more robust against the TrafficSliver websites. Particularly, for traffic during the complete loading of
defense compared to baselines. websites, the reliability of existing WF attacks is limited. RF, Tik-
tok, ARES, and DF, which claim to be robust attacks capable of
6.5 Reliability Evaluation undermining WTF-PAD defense, achieve high average precisions
The page loading speeds vary significantly across different websites, of 96.78%, 94.51%, 91.09%, and 91.19% in our evaluation. However,
making it difficult to ensure high precision in detecting early-stage the P@min for RF, Tik-tok, ARES, and DF are only 66.87%, 64.46%,
traffic of all websites. Therefore, we use the minimum precision 54.30% and 54.11%, respectively. In contrast, Holmes significantly
among all websites (i.e., P@min) to evaluate the reliability of WF improves the reliability of WF attacks and achieves the best P@min
attacks on early-stage traffic. of 82.11%.
Figure 11 illustrates the reliability of WF attacks under WTF- The reliability of Holmes is attributed to three aspects: (i) Holmes
PAD defense. We use the dataset of Alexa-top 95 websites with achieves adaptive data augmentation based on the unique temporal
WTF-PAD defense for evaluation because the variation of WTF- distribution of each website, ensuring high precision in the identifi-
PAD defense based on circuit-level padding has been practically cation of early-stage traffic across all websites. (ii) Holmes employs
deployed in Tor [1]. When the page loading ratio is 60%, Holmes supervised contrastive learning to transform features, effectively
achieves the best P@min of 70.25%, while Var-CNN, ARES, Net- separating traffic from different websites in the new feature space,
CLR, DF, Tik-tok, TMWF, AWF, and TF have the P@min of 0. The thus reducing the misclassification of similar websites. (iii) Holmes
P@min equals 0 means there are websites that these WF attacks calculates the spatial distribution features of website traffic in the
cannot identify. For traffic at page loading rates of 80% and 100%, feature space and enhances the reliability of identification by as-
Holmes achieves an average P@min improvement of 299.43% and sessing the correlation between unknown traffic and the unique
160.60% over baselines, respectively. We find that multi-tab WF spatial distribution of each website.
Robust and Reliable Early-Stage Website Fingerprinting Attacks via Spatial-Temporal Distribution Analysis CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA

6.6 Real-World Evaluation 100 Holmes


RF
Next, we evaluate Holmes using the dataset of 80 dark web web- 80 Var-CNN

Accuracy (%)
sites collected from the real world. Alexa-top websites are widely
60 ARES
used for evaluating WF attacks [10, 36, 37, 39, 40]. However, the NetCLR
ranking of Alexa-top websites is based on the interests of all inter- 40 DF
net users, which may not accurately represent the interests of Tor Tik-tok
users in the real world. Based on the measurements of Tor onion 20 TMWF
services [41], we selected 80 of the most popular dark web websites. 0 20 AWF
We utilized 20 servers deployed across three different countries to 30 40 50 TF
collect dark web traffic in August 2023 and April 2024. Therefore,
Page loading ratio (%)
this dataset encompasses traffic under various network conditions
and traffic exhibiting concept drift due to changes in the websites. Figure 12: Comparison with enhanced baselines with the
We replay packets of testing traffic to evaluate the time overhead early-stage traffic of Alexa-top websites.
and performance of different WF attacks. Moreover, the adversary
cannot know the end time of the page loading in advance. We set average of 255.78%, 215.09%, 136.12%, and 72.15% compared to the
up baselines to end traffic collection when the number of packets enhanced baselines.
meets the input requirements or when no new packets are collected Holmes utilizes the temporal distribution of websites to achieve
within 1 second. Particularly, we use one NVIDIA GeForce RTX website-adaptive data augmentation, effectively generating early-
4090 to accelerate the inference of DL models. stage traffic that contains sufficient website information. Further-
Table 3 shows the comparison of WF attacks using the dataset of more, Holmes employs supervised contrastive learning to extract
dark web websites in the real-world evaluation. The attack latency the correlations between early-stage traffic and complete traffic,
refers to the average time taken to collect and identify unknown which enables more effective correlation analysis between samples
traffic, while the loading ratio represents the average page loading compared to traditional supervised learning.
ratio when the website identification result is obtained from the
WF attack. In particular, we optimize the best-performing attack 6.8 Parameters Analysis
RF. RF30% represents RF attacks with the packet sequence lengths
We further study the impact of different parameter values on the
reduced to 30% of the original length. We adjust the input sequence
performance of Holmes. We select four key parameters, including
lengths of the RF and retrain the models. Reducing the input length
the lower bound of the cumulative time distribution 𝜇, the upper
significantly optimizes latency, but also compromises the identifi-
bound of the cumulative time distribution 𝜆, the embedding size 𝜂,
cation precision of RF. We find that compared to existing attacks
and the temperature 𝛾. We measure the accuracy of Holmes when
and enhanced RF, Holmes exhibits the best attack efficiency and
the website loading ratios are 20%, 40%, and 60%, respectively.
identification precision. Specifically, Holmes reduces latency by an
As shown in Figure 13, we show the accuracy of Holmes un-
average of 66.33% and improves precision by an average of 169.36%
der different parameter settings. The performance of Holmes is
compared to baselines.
insensitive to the settings of the lower bound 𝜇, upper bound 𝜆, and
Dark web websites utilize onion services for server anonymiza-
embedding size 𝜂. For example, when the embedding size 𝜂 is in-
tion, requiring more Tor relays and additional time overhead to
creased from 64 to 768, the accuracy of Holmes for the traffic of 20%
load. We additionally use the dataset of Alexa-top websites under
loaded ranges from 64.06% to 65.29%. For the traffic of 60% loaded,
WTF-PAD defense for real-world evaluation. Holmes outperforms
the accuracy of Holmes ranges from 96.45% to 96.58%. Moreover,
existing attacks and enhanced RF in terms of latency and perfor-
we observe that a larger temperature 𝛾 leads to a decrease in the
mance. For Alexa-top websites, Holmes reduces latency by an av-
performance of Holmes. The reason is that larger temperature 𝛾
erage of 66.38% and increases precision by an average of 32.32%
will make model training difficult. Particularly, the performance of
compared to baselines. Holmes’s advantages are attributed to adap-
Holmes is still stable when the temperature 𝛾 is less than 0.15. In
tive data augmentation for different websites and leveraging the
general, the performance of Holmes is not sensitive to parameter
spatial distribution of websites for adaptive traffic collection and
choices.
high-precision website identification.
7 Discussion
6.7 Comparison with Enhanced Baselines Concept Drift. The changing content of websites over time can
In this section, we enhance the baselines and compare them with lead to a decline in the effectiveness of WF attacks, i.e., concept
Holmes. Similar to the data augmentation module of Holmes, we drift. Concept drift can be addressed by periodically collecting new
generate early-stage traffic by masking the tail of the traffic with traffic and retraining models [2, 10, 36, 40]. However, there are two
random lengths, which is added to the training datasets of baselines. key challenges. (i) Detecting concept drift is difficult, and exist-
We evaluate the accuracy of WF attacks on early-stage traffic using ing attacks detect concept drift by observing the degradation of
the dataset of the Alexa-top 95 websites. As shown in Figure 12, attack performance. (ii) Collecting traffic from all websites and re-
Holmes maintains a significant advantage in identifying early-stage training models is time-consuming and resource-intensive. Holmes
traffic compared to the enhanced baselines. When the page loading can effectively detect concept drift samples for each website in the
ratio is 20%, 30%, 40%, and 50%, Holmes’ accuracy improved by an open-world setting. Furthermore, Holmes does not require frequent
CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA Xinhao Deng, Qi Li, and Ke Xu

20% loaded 40% loaded 60% loaded


100 100 100 100
75 75 75 75
Precision (%)

50 50 50 50
25 25 25 25
0 0.1 0.2 0.3 0.4 0.5 0 0.4 0.5 0.6 0.7 0.8 0 64 128 256 512 768 00.01 0.05 0.1 0.2 0.3
The value of The value of The value of The value of
(a) Lower bound of CDF (b) Upper bound of CDF (c) Embedding size (d) Temperature

Figure 13: Evaluation of Holmes with different parameter settings. We show the identification precision of Holmes at 20%, 40%,
and 60% loading stages, respectively.

model retraining. We only collect traffic from websites with con- Practical WF Attacks. The feasibility of deploying existing WF
cept drift and update the centroid and radius of the corresponding attacks in the real world is hampered by strong assumptions [7,
websites. 22]. Recent works aim to relax these assumptions in real-world
Multi-tab Browsing. In recent years, the identification of obfus- settings, e.g., multi-tab browsing [10, 21], robust WF attacks against
cated traffic in multi-tab browsing has been widely studied [10, 21]. defenses [37], attacks with a small number of training samples [40],
In fact, multi-tab WF attacks can be transformed into multiple dynamic network conditions [2], open-world attacks [42]. Holmes
single-tab WF attacks. The adversary at the guard node can split aims to accurately identify websites at a very early stage of page
obfuscated traffic based on the circuit ID [39, 40]. On the other loading, further enhancing the practicality of WF attacks.
hand, Holmes can be used to enhance the performance of existing Early-Stage Traffic Analysis. Early-stage traffic analysis facili-
multi-tab attacks and trained used obfuscated traffic under multi- tates real-time processing of traffic, which is crucial for throttling
tab settings. For instance, Holmes can replace the Trans-WF model malicious traffic [9, 14, 26, 33]. Most existing studies focus on early-
in the multi-tab attack framework ARES [10], effectively identifying stage non-encrypted traffic analysis, where traffic can be accurately
websites in the early stages of page loading. identified by using a small number of packets [15, 20]. The challenge
Countermeasure against Holmes. Holmes exploits the temporal intensifies if the traffic under analysis is encrypted [34]. Recently,
and spatial distribution of website traffic. The spatial distribution DL-based traffic analysis methods [45, 47] achieve accurate early-
can be disrupted through traffic obfuscation. One possible design stage encrypted traffic classification in specific scenarios. However,
is as follows. The Defender collects traffic in advance to calculate existing methods cannot achieve WF attacks in the early stages
the spatial distribution of website traffic. Then the defender utilizes under Tor traffic. Holmes achieves early-stage WF attacks by ana-
GAN to generate obfuscated traffic based on the spatial distribution lyzing the spatial-temporal correlations among website traffic. To
of website traffic so that the distance between the obfuscated traffic the best of our knowledge, Holmes is the first early-stage traffic
and the centroid of the website increases. We leave an in-depth analysis for Tor traffic.
exploration of this design to future work.
Limitations of Holmes. First, Holmes may not be able to accu-
rately identify websites with the same template and similar content 9 Conclusion
because they generate similar traffic patterns. Second, Tor software In this paper, we propose Holmes, a reliable and robust early-stage
updates and significant modifications of website content lead to WF attack. Specifically, Holmes utilizes the temporal distribution
changes in website traffic patterns, which may impact the perfor- of website traffic to achieve website-adaptive data augmentation
mance of Holmes. We aim to further improve the practicality of and employs supervised contrastive learning to embed traffic into a
WF attacks in future work. low-dimensional feature space. Holmes calculates the correlation of
early-stage traffic with each website by leveraging the spatial distri-
bution of website traffic in the embedding space, thereby enabling
8 Related Work early-stage website identification. We conduct extensive evalua-
DL-based WF Attacks. Recently deep learning has been widely tions of Holmes using six datasets, and the experiment results
applied to construct website fingerprinting attacks [2, 3, 10, 21, 35– demonstrate its effectiveness in identifying early-stage traffic.
37, 39]. DL-based WF attacks demonstrate outstanding attack per-
formance. However, these attacks require traffic close to the com-
pletion of page loading to identify websites. Holmes leverages the Acknowledgment
temporal distribution and spatial distribution of website traffic, en- We thank our anonymous reviewers for their helpful comments
abling the extraction of correlations among website traffic. There- and feedback. The work is supported in part by NSFC under Grant
fore, our constructed attack can achieve robust and reliable WF 62132011 and 62425201. Qi Li is the corresponding author of this
attacks based on the early-stage traffic of page loading. paper.
Robust and Reliable Early-Stage Website Fingerprinting Attacks via Spatial-Temporal Distribution Analysis CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA

References [25] Christophe Leys, Christophe Ley, Olivier Klein, Philippe Bernard, and Laurent
[1] 2023. Circuit-level padding. https://spec.torproject.org/padding-spec/circuit- Licata. 2013. Detecting outliers: Do not use standard deviation around the
level-padding.html mean, use absolute deviation around the median. Journal of experimental social
[2] Alireza Bahramali, Ardavan Bozorgi, and Amir Houmansadr. 2023. Realistic psychology 49, 4 (2013), 764–766.
Website Fingerprinting By Augmenting Network Traces. In Proceedings of the [26] Qi Li, Xinhao Deng, Zhuotao Liu, Yuan Yang, Xiaoyue Zou, Qian Wang, Mingwei
2023 ACM SIGSAC Conference on Computer and Communications Security. 1035– Xu, and Jianping Wu. 2022. Dynamic network security function enforcement via
1049. joint flow and function scheduling. IEEE Transactions on Information Forensics
[3] Sanjit Bhat, David Lu, Albert Kwon, and Srinivas Devadas. 2019. Var-CNN: A Data- and Security 17 (2022), 486–499.
Efficient Website Fingerprinting Attack Based on Deep Learning. Proceedings on [27] Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model
Privacy Enhancing Technologies 4 (2019), 292–310. predictions. Advances in neural information processing systems 30 (2017).
[4] Xiang Cai, Rishab Nithyanand, and Rob Johnson. 2014. Cs-buflo: A congestion [28] Akshaya Mani, T Wilson-Brown, Rob Jansen, Aaron Johnson, and Micah Sherr.
sensitive website fingerprinting defense. In Proceedings of the 13th Workshop on 2018. Understanding tor usage with privacy-preserving measurement. In Pro-
Privacy in the Electronic Society. 121–130. ceedings of the Internet Measurement Conference 2018. 175–187.
[5] Xiang Cai, Rishab Nithyanand, Tao Wang, Rob Johnson, and Ian Goldberg. 2014. [29] Nate Mathews, James K Holland, Se Eun Oh, Mohammad Saidur Rahman,
A systematic approach to developing and evaluating website fingerprinting Nicholas Hopper, and Matthew Wright. 2023. SoK: A critical evaluation of
defenses. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and efficient website fingerprinting defenses. In 2023 IEEE Symposium on Security
Communications Security. 227–238. and Privacy (SP). IEEE, 969–986.
[6] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexan- [30] Milad Nasr, Alireza Bahramali, and Amir Houmansadr. 2021. Defeating DNN-
der Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with Based Traffic Analysis Systems in Real-Time With Blind Adversarial Perturba-
transformers. In European conference on computer vision. Springer, 213–229. tions. In 30th USENIX Security Symposium.
[7] Giovanni Cherubin, Rob Jansen, and Carmela Troncoso. 2022. Online website [31] Milad Nasr, Alireza Bahramali, and Amir Houmansadr. 2021. Defeating DNN-
fingerprinting: Evaluating website fingerprinting attacks on Tor in the real world. Based Traffic Analysis Systems in Real-Time With Blind Adversarial Perturba-
In 31st USENIX Security Symposium (USENIX Security 22). 753–770. tions. In 30th USENIX Security Symposium.
[8] Wladimir De la Cadena, Asya Mitseva, Jens Hiller, Jan Pennekamp, Sebastian [32] Andriy Panchenko, Fabian Lanze, Jan Pennekamp, Thomas Engel, Andreas Zin-
Reuter, Julian Filter, Thomas Engel, Klaus Wehrle, and Andriy Panchenko. 2020. nen, Martin Henze, and Klaus Wehrle. 2016. Website Fingerprinting at Internet
Trafficsliver: Fighting website fingerprinting attacks with traffic splitting. In Scale.. In NDSS.
Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications [33] Yuqi Qing, Qilei Yin, Xinhao Deng, Yihao Chen, Zhuotao Liu, Kun Sun, Ke
Security. 1971–1985. Xu, Jia Zhang, and Qi Li. 2023. Low-Quality Training Data Only? A Robust
[9] Xinhao Deng, Mingwei Xu, Qi Li, Weijie Wu, Yuan Yang, Menghao Zhang, Yu Framework for Detecting Encrypted Malicious Network Traffic. arXiv preprint
Zhou, and Jianping Wu. 2024. Exploring Dynamic Rule Caching Under Depen- arXiv:2309.04798 (2023).
dency Constraints for Programmable Switches: Theory, Algorithm, and Imple- [34] Buyu Qu, Zhibin Zhang, Li Guo, and Dan Meng. 2012. On accuracy of early
mentation. IEEE Transactions on Network and Service Management (2024). traffic classification. In 2012 IEEE Seventh International Conference on Networking,
[10] Xinhao Deng, Qilei Yin, Zhuotao Liu, Xiyuan Zhao, Qi Li, Mingwei Xu, Ke Xu, Architecture, and Storage. IEEE, 348–354.
and Jianping Wu. 2023. Robust Multi-tab Website Fingerprinting Attacks in [35] Mohammad Saidur Rahman, Payap Sirinam, Nate Mathews, Kantha Girish Gan-
the Wild. In 2023 IEEE Symposium on Security and Privacy (SP). IEEE Computer gadhara, and Matthew Wright. 2020. Tik-Tok: The Utility of Packet Timing in
Society, 1005–1022. Website Fingerprinting Attacks. Proceedings on Privacy Enhancing Technologies 3
[11] Mariano Di Martino, Peter Quax, and Wim Lamotte. 2019. Realistically finger- (2020), 5–24.
printing social media webpages in https traffic. In Proceedings of the 14th Interna- [36] Vera Rimmer, Davy Preuveneers, Marc Juarez, Tom Van Goethem, and Wouter
tional Conference on Availability, Reliability and Security. 1–10. Joosen. 2018. Automated Website Fingerprinting through Deep Learning. In
[12] Roger Dingledine, Nick Mathewson, and Paul Syverson. 2004. Tor: The second- NDSS.
generation onion router. Technical Report. Naval Research Lab Washington DC. [37] Meng Shen, Kexin Ji, Zhenbo Gao, Qi Li, Liehuang Zhu, and Ke Xu. 2023. Sub-
[13] Kevin P Dyer, Scott E Coull, Thomas Ristenpart, and Thomas Shrimpton. 2012. verting Website Fingerprinting Defenses with Robust Traffic Representation. In
Peek-a-boo, i still see you: Why efficient traffic analysis countermeasures fail. In 32nd USENIX Security Symposium (USENIX Security 23). 607–624.
2012 IEEE symposium on security and privacy. IEEE, 332–346. [38] Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning im-
[14] Chuanpu Fu, Qi Li, Meng Shen, and Ke Xu. 2024. Detecting Tunneled Flooding portant features through propagating activation differences. In International
Traffic via Deep Semantic Analysis of Packet Length Patterns. In Proceedings of conference on machine learning. PMLR, 3145–3153.
the 2024 ACM SIGSAC Conference on Computer and Communications Security. [39] Payap Sirinam, Mohsen Imani, Marc Juarez, and Matthew Wright. 2018. Deep
[15] Gabriel Gómez Sena and Pablo Belzarena. 2009. Early traffic classification using fingerprinting: Undermining website fingerprinting defenses with deep learning.
support vector machines. In Proceedings of the 5th International Latin American In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communica-
Networking Conference. 60–66. tions Security. 1928–1943.
[16] Jiajun Gong and Tao Wang. 2020. Zero-delay lightweight defenses against website [40] Payap Sirinam, Nate Mathews, Mohammad Saidur Rahman, and Matthew Wright.
fingerprinting. In 29th USENIX Security Symposium. 717–734. 2019. Triplet fingerprinting: More practical and portable website fingerprinting
[17] Jiajun Gong, Wuqi Zhang, Charles Zhang, and Tao Wang. 2022. Surakav: gen- with n-shot learning. In Proceedings of the 2019 ACM SIGSAC Conference on
erating realistic traces for a strong website fingerprinting defense. In 2022 IEEE Computer and Communications Security. 1131–1148.
Symposium on Security and Privacy (SP). IEEE, 1558–1573. [41] Chunmian Wang, Junzhou Luo, Zhen Ling, Lan Luo, and Xinwen Fu. 2023. A com-
[18] Jamie Hayes and George Danezis. 2016. k-fingerprinting: A robust scalable prehensive and long-term evaluation of tor v3 onion services. In Proceedings of
website fingerprinting technique. In 25th USENIX Security Symposium. 1187– the 42nd IEEE International Conference on Computer Communications (INFOCOM).
1203. IEEE.
[19] James K Holland and Nicholas Hopper. 2022. RegulaTor: A Straightforward [42] Tao Wang. 2020. High precision open-world website fingerprinting. In 2020 IEEE
Website Fingerprinting Defense. Proceedings on Privacy Enhancing Technologies Symposium on Security and Privacy (SP). IEEE, 152–167.
2022, 2 (2022), 344–362. [43] Tao Wang, Xiang Cai, Rishab Nithyanand, Rob Johnson, and Ian Goldberg. 2014.
[20] N-F Huang, G-Y Jai, and H-C Chao. 2008. Early identifying application traffic Effective attacks and provable defenses for website fingerprinting. In 23rd USENIX
with application characteristics. In 2008 IEEE International Conference on Com- Security Symposium. 143–157.
munications. IEEE, 5788–5792. [44] Tao Wang and Ian Goldberg. 2017. Walkie-talkie: An efficient defense against
[21] Zhaoxin Jin, Tianbo Lu, Shuang Luo, and Jiaze Shang. 2023. Transformer-based passive website fingerprinting attacks. In 26th USENIX Security Symposium. 1375–
Model for Multi-tab Website Fingerprinting Attack. In Proceedings of the 2023 1390.
ACM SIGSAC Conference on Computer and Communications Security. 1050–1064. [45] Yipeng Wang, Huijie He, Yingxu Lai, and Alex X Liu. 2022. A Two-Phase Ap-
[22] Marc Juarez, Sadia Afroz, Gunes Acar, Claudia Diaz, and Rachel Greenstadt. 2014. proach to Fast and Accurate Classification of Encrypted Traffic. IEEE/ACM
A critical evaluation of website fingerprinting attacks. In Proceedings of the 2014 Transactions on Networking (2022).
ACM SIGSAC Conference on Computer and Communications Security. 263–274. [46] Yixiao Xu, Tao Wang, Qi Li, Qingyuan Gong, Yang Chen, and Yong Jiang. 2018.
[23] Marc Juárez, Mohsen Imani, Mike Perry, Claudia Dıaz, and Matthew Wright. A multi-tab website fingerprinting attack. In Proceedings of the 34th Annual
2015. WTF-PAD: toward an efficient website fingerprinting defense for tor. CoRR, Computer Security Applications Conference. 327–341.
abs/1512.00524 (2015). [47] Pengwei Zhan, Liming Wang, and Yi Tang. 2021. Website fingerprinting on early
[24] Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip QUIC traffic. Computer Networks 200 (2021), 108538.
Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. 2020. Supervised contrastive [48] Arthur Zimek, Erich Schubert, and Hans-Peter Kriegel. 2012. A survey on unsu-
learning. Advances in neural information processing systems 33 (2020), 18661– pervised outlier detection in high-dimensional numerical data. Statistical Analysis
18673. and Data Mining: The ASA Data Science Journal 5, 5 (2012), 363–387.

You might also like