0% found this document useful (0 votes)
4 views20 pages

HDHunter

Hd hunter

Uploaded by

lecanip592
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views20 pages

HDHunter

Hd hunter

Uploaded by

lecanip592
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

The Silent Danger in HTTP: Identifying HTTP Desync Vulnerabilities with

Gray-box Testing

Keran Mu1 , Jianjun Chen1,2,B , Jianwei Zhuge1,2 , Qi Li1,2 , Haixin Duan1,2 , Nick Feamster3
1 Tsinghua University
2 Zhongguancun Laboratory
3 University of Chicago

Abstract and attacks [11] towards famous Web services have recently
HTTP Desync is a high-risk threat in today’s decentralized emerged, illustrating HTTP Desync has become a serious
Internet, stemming from discrepancies among HTTP im- threat to the Internet.
plementations. Current automatic detection tools, primarily Previous work has developed various detection tools for
dictionary-based scanners and black-box fuzzers, lack insights HTTP Desync vulnerabilities [13, 24, 45]. Smuggler [13] em-
into internal states of implementations, leading to ineffective ploys a series of predefined payloads to test the exploitability
testing. Moreover, they focus on the request-side Desync, of websites, but this method lacks the capability to uncover
overlooking vulnerabilities in HTTP responses. new variants of attacks. On the other hand, T-Reqs [24] and
In this paper, we present HDH UNTER, a novel automatic HDiff [45] introduce black-box fuzzing techniques to gener-
HTTP discrepancy detection framework using the gray-box ate test cases based on HTTP grammar or RFC documents
coverage-directed differential testing technique. HDH UNTER to identify HTTP request smuggling attacks. However, they
can discover discrepancies in not only HTTP requests but suffer from two limitations: First, the ‘blind’ nature of black-
also HTTP responses and CGI responses. We evaluated our box testing makes their testing ineffective, because they lack
HDH UNTER prototype against 19 state-of-the-art HTTP im- insights into the internal states of the targets, leading to inef-
plementations and identified 17 new HTTP Desync vulner- fectiveness and incompleteness in testing. Second, they only
abilities. We have disclosed all identified vulnerabilities to focus on the request side testing, overlooking the potential
corresponding vendors and received acknowledgments and HTTP Desync vulnerabilities in HTTP responses.
bug bounty rewards, including 9 CVEs from well-known Gray-box coverage-directed testing has proven to be highly
HTTP software, including Apache, Tomcat, Squid, etc. effective in uncovering vulnerabilities in various targets, in-
cluding TLS [9] and JVM [8]. This technique leverages ex-
ecution coverage to guide the mutation of inputs, thereby
1 Introduction enabling a more thorough exploration of program branches.
However, the state-of-the-art gray-box coverage-directed tools
Today’s Internet has largely deviated from the end-to-end
like AFL [51] are not readily suitable for identifying HTTP
principle of its original design. Numerous middleboxes such
Desync due to three challenges. First, existing tools like AFL
as proxies, firewalls, and content delivery networks (CDN)
are adept at identifying memory-related bugs within a sin-
are commonly deployed in the network. As a result, a typical
gle target, but HTTP Desync involves discrepancies between
end-to-end HTTP request may traverse multiple middleboxes
multiple HTTP implementations. Second, there is a lack of
before reaching its final destination.
effective vulnerability detectors to detect Desync vulnerabili-
However, this layered architecture introduces a potential ties in HTTP requests and responses. Third, HTTP Desync’s
threat: HTTP Desync. The vulnerability arises when different potential to disrupt the HTTP message queue and TCP state
HTTP implementations in the chain may have processing dis- can significantly destabilize the fuzzing process, resulting in
crepancies on the same message. Attackers can leverage the numerous false positives and negatives.
discrepancies to desynchronize the HTTP message queues,
In this paper, we propose HDH UNTER, a novel HTTP
resulting in message smuggling and manipulation. Such at-
Desync vulnerability discovery framework to address the
tacks could lead to severe security consequences, such as
above issues. We address the first challenge by utilizing the ex-
cache poisoning, session hijacking, account takeovers, and
ecution coverage information from multiple implementations
security policy bypass. Numerous such vulnerabilities [27,29]
to optimize the test case generation. For the second challenge,
B Corresponding author: jianjun@tsinghua.edu.cn we developed a new HTTP Desync detector to extract internal
states from inside of the implementations to identify all forms need for new connections. HTTP pipelining extends this by
of HTTP Desync vulnerabilities. For the third challenge, we enabling multiple requests to be sent through the same con-
implemented a snapshot-based execution framework that uti- nection at once without waiting for each response, though
lizes the snapshot to reset the network states effectively. responses must be delivered in order. Together, these fea-
We implemented a HDH UNTER prototype and evaluated tures reduce network and processing overhead, improving
it against 19 state-of-the-art open-source HTTP implemen- efficiency.
tations. We highlight five primary types of found discrepan- Common Gateway Interface (CGI) is a protocol for gen-
cies: 1) Non-standard numbers; 2) Inconsistent trailer sec- erating dynamic web content, serving as an intermediary be-
tion acceptance; 3) Non-standard line separator; 4) Different tween a web server and external applications. CGI scripts,
TE.CL attack handling strategies; 5) Incomplete response written in various programming languages, enable interactive
sanitization. We identified 17 HTTP Desync vulnerabilities elements in web applications. Building on CGI, advance-
in those implementations, affecting famous implementations ments like WSGI [15, 16] used by Python applications and
like Apache, Tomcat, Squid, etc. We disclosed all of the found FastCGI [34] used by PHP have emerged, improving perfor-
vulnerabilities to corresponding maintainers and received pos- mance and expanding CGI techniques. Other related technolo-
itive feedback. A total of 9 CVE IDs have been assigned. gies include SCGI [42], uwsgi [48], Rack [41], and AJP [3],
$4660 bounty has been granted by Internet Bug Bounty for all extending the original CGI concept.
the vulnerability discovered in Tomcat. HTTP Request Smuggling (HRS) is a technique that
Contributions. In summary, we make the following con- exploits inconsistencies in the interpretation of request bound-
tributions: aries between two web servers, typically a front-end proxy
and a back-end server. By crafting ambiguous HTTP requests,
• We proposed a novel approach, HDH UNTER, that uti- an attacker can smuggle a request within another, leading to
lizes the gray-box coverage-directed differential testing various security issues such as security controls bypassing, or
framework to identify HTTP Desync vulnerabilities au- request manipulating.
tomatically. A common method for exploiting HRS involves leveraging
• We developed and open-sourced a prototype of two distinct HTTP encoding methods: no encoding and chun-
HDH UNTER1 , and evaluated it on 19 state-of-the-art ked encoding, regulated by two headers: Content-Length
HTTP implementations. We found 17 new HTTP Desync (CL) and Transfer-Encoding (TE). No encoding imposes
vulnerabilities in Apache, Tomcat, Squid, and other no restrictions on the message body format and the value of
HTTP implementations. CL denotes the message body length. Chunked encoding is
intended for use when the message’s overall length cannot be
• We conducted the first study to automatically identify determined by the time the headers are dispatched. It adheres
Desync vulnerabilities in HTTP responses and CGI re- to a structured format where the content is divided into seg-
sponses. We responsibly disclosed all of them to the ments, each labeled with its own size indicator. Although, it
corresponding vendors and received positive feedback. is mentioned in the HTTP RFC [18] that “a sender MUST
NOT send a Content-Length header field in any message that
2 Background contains a Transfer-Encoding header field”, HTTP servers ex-
hibit diverse levels of support for chunked encoding and may
not adopt the best practice to process messages when both
2.1 HTTP, CGI, and Request Smuggling
TE and CL headers are present. This can lead to differences
Hypertext Transfer Protocol (HTTP) is the foundation of in how the message is recognized and handled by different
web applications, functioning as a request-response protocol servers.
between clients and servers. HTTP has evolved significantly, Figure 1 demonstrates that when both TE and CL are
with HTTP/1.1 and earlier versions using a text-based format, present, the front-end proxy accepts the TE header and for-
while HTTP/2 and later versions adopted a binary format. wards the whole message without discarding the CL header.
Despite its age, HTTP/1.1 remains widely supported due to In the meanwhile, the back-end server accepts the CL header,
its established infrastructure and ease of debugging, even leading to the splitting of one request into multiple requests.
though it presents challenges in message parsing. In this paper,
‘HTTP’ specifically refers to HTTP/1.1.
HTTP has two key features that enhance web communi- 2.2 Fuzzing Techniques
cation: persistent connections and HTTP pipelining. Persis- Fuzzing is an automated software testing technique, that
tent connections (keep-alive) allow multiple HTTP requests provides a large quantity of unexpected or random input data
and responses to use the same TCP connection, reducing the to the test target, in order to identify potential vulnerabilities.
1 Link
to the repository of HDH UNTER: https://github.com/muker Categories of Fuzzing Techniques. Fuzzing techniques
an/HDHunter are categorized into black-box, white-box, and gray-box based
POST /public HTTP/1.1
Connection: keep-alive
POST /public HTTP/1.1
Connection: keep-alive
POST /public HTTP/1.1
Connection: keep-alive
Request Smuggling to encompass six forms of desynchro-
Transfer-Encoding: chunked Transfer-Encoding: chunked Transfer-Encoding: chunked
Content-Length: 11 Content-Length: 11 Content-Length: 11 nization in HTTP transactions. First, we have expanded and
2[CR][LF] 2[CR][LF] 2[CR][LF]
ab[CR][LF]
23[CR][LF]
ab[CR][LF]
23[CR][LF]
ab[CR][LF] categorized the form of HTTP Request-side Desync as fol-
23[CR][LF]
GET /secret HTTP/1.1[CR][LF]
Evil: value[CR][LF]
GET /secret HTTP/1.1[CR][LF]
Evil: value[CR][LF] GET /secret HTTP/1.1
lows.
[CR][LF] [CR][LF] Evil: value
0[CR][LF] 0[CR][LF]
[CR][LF] [CR][LF]
0[CR][LF]
[CR][LF] • Inconsistent number of messages. The borders between
HTTP messages are determined by the semantics of the
Request Forward
message. The HTTP protocol includes multiple optional
fields that define these boundaries, and different HTTP
Attacker Front-end Proxy Back-end Server
(Accept TE and forward TE and CL) (Accept CL) parsers may interpret these fields differently. Therefore,
different HTTP parsers may recognize one payload into
Figure 1: Example of HTTP Request Smuggling different numbers of HTTP messages. This is the tra-
ditional form of the HTTP Desync, as shown in Fig-
ure 2(a).
on the fuzzer’s knowledge of the software. Black-box fuzzing
operates without internal knowledge, treating the software as • Inconsistent content of messages. In some cases, the
opaque and testing it externally. White-box fuzzing requires handling discrepancies did not bring more or less HTTP
a comprehensive understanding of the software’s internals, messages but resulted in different contents, as demon-
enabling precise testing. Gray-box fuzzing occupies a middle strated in Figure 2(b). This threat model exists in proto-
ground, leveraging a partial understanding of the software’s col conversion between HTTP and other protocols like
internal running states to enhance testing efficiency. It offers CGI, while the HTTP and CGI parsers may adopt differ-
greater insight than black-box fuzzing yet does not require ent strategies to recognize the payload.
the exhaustive detail of white-box fuzzing.
Coverage-directed Fuzzing. In gray-box fuzzing, the most • Inconsistent order of messages. Modern HTTP servers
commonly used internal state to direct the fuzzing process is use techniques like HTTP pipelining and persistent con-
code coverage. Coverage in this context refers to the extent to nections to improve performance. Those features allow
which the code of the program is executed when subjected to the client to send multiple requests simultaneously and
test cases. By monitoring which parts of the code are being the server is supposed to process and respond to these
exercised by the inputs, gray-box fuzzing efficiently identifies requests in the order they were received. However, if the
untested portions of the program, directing the input mutation implementation sends the response in a wrong or unex-
or generation towards these areas to enhance the thoroughness pected order, it could result in HTTP Desync. Figure 2(c)
of the testing. provides an illustrative example of a proxy that handles
Differential Testing. Normally, the focus of fuzzing is on messages in a reversed order, resulting in HTTP Desync.
a single instance and is intended to detect memory-related
Second, we extend the threat model to cover HTTP Re-
vulnerabilities like buffer overflows, memory leaks, and null
sponse Desync, because issues prevalent in request handling
pointer dereferences. It uses various memory sanitizers as the
can similarly affect response processing. In the multifaceted
oracle. In contrast, differential testing is particularly effective
internet infrastructure of today, characterized by Content De-
in discovering logic issues. This method entails analyzing
livery Networks (CDNs) and microservices, proxies often
the same inputs on multiple implementations or software
serve multiple endpoints, each managed by distinct entities.
versions to highlight differences in how each system handles
If an attacker gains control over any of these endpoints and
the data by comparing their output. Such comparative analysis
discrepancies in response parsing exist among proxies or be-
is essential to detect logical errors that may not be detected
tween proxies and clients, they can potentially infiltrate the
during a single-instance test.
response queue, leading to the manipulation of responses or
the exposure of sensitive information. For example, services
3 Overview like AWS API Gateway and projects like Kubernetes Ingress
support configuring different paths that point to different up-
3.1 Threat Model stream URLs. If one of the upstreams is compromised and the
response parser is vulnerable to the TE.CL attack, attackers
The core of HTTP Desync lies in the discrepancies in how can smuggle a fake response with both TE and CL headers
different implementations interpret the HTTP protocol, allow- into the victim’s response queue, resulting in response manip-
ing attackers to perform a variety of malicious attacks. The ulation. Figure 2(d),2(e) and 2(f) are 3 examples when incon-
aforementioned HTTP Request Smuggling is one common sistent number, content, and order of messages take place in
form of HTTP Desync. HTTP responses. It is important to note that endpoints may
In this paper, we extend the scope of traditional HTTP have not only an HTTP upstream but also an application up-
③ ② ① ② ① ② ①
① ① ①
Attacker Attacker Attacker
② ② ②
Proxies Server(s) Proxies Server(s) Proxies Server(s)

User User User


(a) Inconsistent number of messages in HTTP requests (b) Inconsistent content of messages in HTTP requests (c) Inconsistent order of messages in HTTP requests

Response of 2nd req.


(1st Arrived)
① ② ③ ① ② ① ②
① ① ②
Attacker Attacker Responses
Attacker
mismatch requests
② ② ①
Downstream(s) Proxies Downstream(s) Proxies Downstream(s) Proxies
Response of 1st req.
(2nd Arrived)
Server Server Server
(d) Inconsistent number of messages in HTTP responses (e) Inconsistent content of messages in HTTP responses (f) Inconsistent order of messages in HTTP responses

Figure 2: Examples of HTTP Desync: Inconsistent number, content, and order of messages

stream capable of delivering HTTP responses derived from quire more effort in strategy design.
CGI responses. C2. How do we identify a potential HTTP desynchro-
To the best of our knowledge, this represents the most com- nization? Given a test input, we need an oracle to decide
prehensive taxonomy of HTTP Desync attacks. As we will whether the HTTP desynchronization occurs, similar to sani-
describe later, those insights enable us to detect and discover tizers in memory-related bug identification. Previous methods,
novel attacks missed by previous research. such as error-based and timeout-based detection, are ineffi-
cient and have a high false positive rate. Meanwhile, these
methods target existing attacks and are not designed to find
3.2 Challenges new types of HTTP Desync attacks based on the principle
Typically, fuzzing involves three key components: (1) gen- and overlook the detection of response-side Desync vulnera-
erating effective test cases; (2) executing the targets with each bilities. A new identification method is required to cover all
test input; (3) detecting whether vulnerabilities are triggered. potential HTTP desynchronization scenarios.
Our research aims to advance existing detection tools in all C3. How to efficiently restore states of HTTP implemen-
three components to effectively identify HTTP Desync vulner- tations? HTTP desynchronization can disrupt both the HTTP
abilities. We have identified the following research challenges message queue and the TCP state. The remaining state from
that need to be addressed to accomplish our objective. each test can significantly influence the stability of the fuzzing
C1. How do we generate effective test cases to maxi- process, leading to numerous false positives and negatives. To
mize the possibility of HTTP desynchronization? HTTP ensure effective fuzzing, it is crucial to start each test with a
desynchronization is caused by parsing and processing dis- clear state. The straightforward solution is to restart the target
crepancies between multiple implementations. The genera- for each test input to reset the state. However, the initializa-
tion and mutation strategies should be designed based on the tion process of large HTTP implementations like Tomcat can
objectives. take a considerable amount of time. Therefore, developing an
Previous studies have presented several approaches to ad- efficient mechanism for state recovery is essential to optimize
dress this. T-Reqs [24] proposes CFG-based input gener- the testing process.
ation and tree-level mutation to enhance its methodology.
HDiff [45] utilizes similar methods while incorporating nat-
ural language processing techniques in its input generation. 4 Design
However, they are both black-box fuzzing techniques that lack
internal insights from the target system. Their "blind" nature We propose HDH UNTER, a gray-box coverage-directed
makes their testing ineffective in uncovering deep corner cases differential fuzzing technique targeting discrepancies between
in implementations. Moreover, these tools do not generate open-source HTTP implementations. Unlike existing black-
test cases targeting HTTP responses and CGI responses. box solutions, HDH UNTER can explore more execution paths
On the other hand, current gray-box coverage-guided in test targets thanks to the coverage information collected
fuzzing tools like AFL excel in identifying memory-related during processing. Besides, the combination of coverage from
bugs in a single target. In contrast, HTTP Desync in- two implementations can be used as an indicator to help in-
volves discrepancies between multiple HTTP implementa- duce more discrepancies. HDH UNTER not only targets HTTP
tions. Presently, there are no off-the-shelf gray-box fuzzing requests, but also HTTP and CGI responses, as illustrated in
tools for identifying HTTP Desync vulnerabilities, which re- Figure 2.
Impl. A Impl.B
Mutator
Snapshot-based Execution Framework
Sequence
Network Log ① Seeds ② Test case ③ Mutated ④(b) States ⑤(b) Report
Executor Detector
Corpus Message Input Discrepancies

Byte ④(a) Coverage ⑦ Manual Analysis


⑤(a) Discrepancy
⑥ New seeds
Feedback
Vulnerabilities

Figure 3: Architecture and Workflow of HDH UNTER

Definition from RFC


4.1 Workflow HTTP-message = start-line CRLF
*( field-line CRLF )
CRLF
We adopt state-of-the-art AFL-like fuzzing framework [ message-body ]

based on Genetic Algorithm, while adding targeted strate-


gies and mechanisms for HTTP Desync. POST / HTTP/1.1 HTTP/1.1 200 OK Status: 200†
Content-Type: text/plain Content-Type: text/plain Content-Type: text/plain
The workflow of HDH UNTER is illustrated in Figure 3. It Content-Length: 12 Content-Length: 12
Hello world!
can be broken down into seven steps: 1) We manually extract Hello world! Hello world!

HTTP messages from the network traffic as initial seeds and HTTP Request HTTP Response CGI Response
add them to the corpus; 2) The main fuzzing loop starts, where †The Status header in a CGI response is one of the field lines but acts
the fuzzer selects a test case from the corpus; 3) The mutators as the start line.
mutate the selected test case; 4) The executor runs the mutated
Figure 4: Comparison Between HTTP Request, Response,
input on two implementations while collecting their coverage
and CGI Response: Three different colors represent three
and states; 5) The detector determines if there is any discrep-
different parts of HTTP.
ancy between the outputs of these two implementations, and if
so, generates reports; 6) The feedback mechanisms determine
if the mutated input has value in triggering more discrepan-
multiple HTTP messages in one single connection. More-
cies, if so, add it to the corpus, in any case loop steps 2–6;
over, CGI responses are similar in format to HTTP responses,
7) Manually analyze discrepancies to identify HTTP Desync
with a notable distinction in their method of conveying the
vulnerabilities.
status code. Unlike HTTP responses, CGI responses employ
a specialized Status header for transmitting the status code.
4.2 Test Case Generation Consequently, it is feasible to design a universal test case
structure that encompasses all three message types.
We create a new structure for test cases and designated We restructure the test cases, which contain an array of
mutation strategies to induce more discrepancies within the HTTP messages. Each message is composed of three parts,
mutator, and use coverage-directed feedback to direct new start line, field lines, and message body, reflecting the nature
test case generation. of HTTP. The message body supports the raw and chunked
encoding. Messages are stored using a dynamic tree structure,
reflecting the ABNF definition listed in Appendix A, which
4.2.1 Structure of Test Case
is extracted from RFC. Each field is labeled with its datatype
HTTP is a text-based network protocol. Unstructured test corresponding to its actual function: string, number, symbol,
cases only support byte-level mutation strategies that cannot to facilitate subsequent mutation operations. However, the la-
satisfy HTTP’s semantic rules. It will take too much time for bels do not restrict the datatypes of the fields. The test case can
the fuzzer to generate a legitimate input. Therefore, HTTP be serialized into HTTP requests, HTTP responses, or CGI
syntax should be considered in the test case’s structure to pave responses, adapting to their respective structural requirements
the way for high-level mutators. as needed.
As illustrated in Figure 4, HTTP messages, including re-
quests and responses, are composed of three parts: the start 4.2.2 Mutation Strategies
line, the field lines (i.e.: headers), and the message body, ac-
cording to RFC. They differ in the start line but share the To explore larger input space and introduce more discrep-
same format in the subsequent parts. The start line and field ancies, we optimize the mutators with three levels of mutation
lines have fixed formats, while the message body is either strategies: 1) Sequence: Based on the fact that HTTP supports
a raw binary stream controlled by Content-Length or a long connection and pipelining, sequence-level strategies can
structured chunked encoding enabled by Transfer-Encodi help explore potential discrepancies introduced by multiple
ng. Besides, the HTTP pipelining and long connection allow messages; 2) Message: Message-level strategies reflect the
Executor Harness Impl. Executor Harness Impl.
① Launch
Executor Restore
Snapshot ① Test case
② Requestin
① Test case
② Requestbenign

Input ③ Restore ② ③ Requestfwd


⑥ Retrieve ⓪ Save on init.
④ Responsein
Harness ③ Response ⑤ Response
Execute ④ ⑤ Collect Cov. & States
④ Cov. & States ⑥ Cov. & States
Implementation ⑤ Result ⑦ Result

QEMU Guest Request (HTTP) Response (HTTP & CGI)

Figure 5: Workflow of Snapshot-based Execution Framework Figure 6: Execution Procedure

HTTP syntax, while blurring the formatting requirements; 3)


instructions. For implementations written in compiled lan-
Byte: Introduce randomness to the input to cover the discrep-
guages, such as C and C++, we use SanitizerCoverage [46]
ancies that caused by Robustness Principle. Detailed mutation
provided by Clang to collect coverage. The coverage infor-
strategies are present in Table 3. One test case can undergo
mation is situated within two distinct processes within two
multiple mutations using multiple mutation strategies in a
QEMU guests. The generated coverage map is collected from
single run.
these processes via shared memory and guest memory access.

4.2.3 Coverage-directed Feedback

The feedback of the seed selection process plays a crucial 4.3.1 Snapshot-based State Recovery
role in deciding whether a mutated input should be added
to the corpus. In this process, AFL utilizes an accumulated
HTTP is running over TCP/IP protocol. The remaining
edge hit count map, a mechanism designed to assess if the cur-
states of the network layer can affect HTTP processing. Be-
rent input introduces new execution paths or activates existing
sides, each implementation has coded its own buffer mecha-
ones with greater frequency, which is the core of the coverage-
nism, which may have flaws. Thus, it is necessary to clear its
directed fuzzing. This approach is adaptable to various imple-
states.
mentations, effectively steering the fuzzing process toward
generating seeds that are more likely to activate a broader Snapshot is commonly used in file systems and virtual
range of edges within each implementation. HDH UNTER uti- machines. It saves states of a specific point and is designed to
lizes this mechanism by combining two edge hit count maps restore them efficiently. Previous research, including Nyx [43]
from two implementations into one double-sized edge map, and its extension Nyx-Net [44], has proposed a fast snapshot
to reflect coverage information from both targets. restoration mechanism for hypervisors. However, Nyx utilizes
The seeds that trigger both new edges and discrepancies hardware-level processing tracing to collect the code coverage,
between two implementations will not be added to the cor- which is good for kernels and drivers but brings more noise for
pus during the fuzzing process, because the inputs mutated userspace applications. Nyx-Net supports AFL-like coverage
from those seeds are more likely to trigger previously found collection, but it implements a custom hook-based network
discrepancies, which will affect the fuzzing efficiency. stack to communicate with test targets, which has limited
support for multi-processing programs and has compatibility
issues when testing complex network applications such as
4.3 Snapshot-based Executor HTTP servers.

The executor runs the mutated input on both targeted imple- To enable highly efficient state recovery for HTTP imple-
mentations under the snapshot-based execution framework. mentations, we reused Nyx’s fast snapshot reloading mecha-
The workflow of the framework is illustrated in Figure 5. The nism, while implementing AFL-like coverage collection. We
framework utilizes the snapshot technique to clear the network built a dedicated harness to communicate with the test targets
states efficiently. It also contains a test harness that supports using Linux’s original socket API. The harness uses Nyx’s
testing not only HTTP requests but also HTTP responses API to take a snapshot right after the target is ready and collect
and CGI responses. To support implementations written in its edge map through shared memory. The states are restored
interpreted programming languages, we adopt the solution after executing each test case, and the coverage is retrieved
proposed by Witcher [47] to collect coverage from interpreted using the guest memory access functionality.
languages, which modifies the bytecode interpreters of the The harness checks if the target is ready by repeatedly
programming languages to update the coverage information executing a benign probing input. Once the input is accepted
using the line number and opcode of the current and prior without error, we assume that the target is in the ready state.
4.3.2 Support for HTTP Requests, Responses, and CGI the corresponding implementation to extract headers, copies
Responses the internal buffer to retrieve the body content and length,
and records the message count before dispatching. The con-
The test harness has two modes of execution, one for HTTP sumed length is tracked by incrementing counters each time
requests and another for HTTP and CGI responses. Figure 6 the implementation reads data from the buffer. The detailed
demonstrates the procedures for execution. example of the code we inserted into Apache is demonstrated
For HTTP requests, the test harness follows a standard in Appendix D.2.
procedure to send the request. It obtains the test case from the
For Order and Status, we collect them externally and auto-
executor, transforms it into an HTTP request (Requestin ), and
matically. Although both disordered requests and responses
passes it to the implementation.
can cause inconsistent order of messages, the most convenient
For HTTP and CGI responses, the implementation is con-
way to check if the disorder occurs is to check if the order of
figured to forward the HTTP and CGI requests to the harness.
the final responses matches. Therefore, we employ an exter-
After obtaining the test case, the harness initially sends a
nal method to determine Order by embedding a header called
benign request (Requestbenign ) to the implementation. After
X-Desync-Id with a distinct UUID string during execution.
receiving the forwarded request (Requestfwd ), the harness re-
The implementations are configured to forward the original
sponds with the formatted test case (Responsein ).
X-Desync-Id header. Order will be set to the collection of
X-Desync-Id values. The status code is a useful and stable
4.4 HTTP Desync Detector external flag to identify if a request is accepted or denied by
the implementation. We collect the status code by reading
The detector is responsible for identifying discrepancies the first lines of produced or forwarded responses in the test
between implementations. It follows a pre-defined set of rules harness.
to identify discrepancies using the internal states collected
during the execution process. We will begin by detailing how
we collect the internal states from implementations via code 4.4.2 Detection Rule
insertion, and then introduce our detection rule.
The detailed detection rules are illustrated in Appendix C.
A discrepancy is considered to exist if any of the rules are not
4.4.1 Internal States Extraction
satisfied. For Count, Body and Order, the states from two im-
Information about the handling process can be partially plementations are simply compared to ascertain whether they
acquired by referring to the forwarded requests and responses. match exactly. For Encoding, CL and Consumed, the Status
However, information collected through this method can be code is first consulted. In the event that both implementations
inaccurate due to the sanitization and post-processing. For generate status codes between 400 and 599, which are treated
example, NGINX will convert chunk-encoded requests to raw as the same error state, it is not meaningful to compare their
encoding during the forwarding process so that we cannot encodings, content lengths, and consumed lengths, as they
know what kind of encoding NGINX has used to process end up with the result — to reject the current message and
the request from the forwarded message. Therefore, we take possibly stop parsing the subsequent input.
measures to extract internal states from the inside of the im-
plementations.
We collect seven types of states that form the State Tuple: 5 Evaluation and Findings

(Count, Consumed, Body, Encoding, CL, Order, Status) 5.1 Experiment Setup
1) Count: the number of requests or responses that the im- Evaluation Targets Selection. HDH UNTER is a gray-box
plementation recognized; 2) Consumed: the actual length of fuzzer designed to find HTTP discrepancies in open-source
messages that the implementation consumed during the pars- projects, so we focus on open-source HTTP implementations.
ing process; 3) Body: the content of the parsed message body. We consider both the GitHub stars and deployment popular-
Since it is hard to extract the message body in some imple- ity since we find that some HTTP implementations, such as
mentations, it is downgraded to the length of the message Apache HTTP Server, receive fewer stars than their popularity
body; 4) Encoding: the encoding used in the parsing process: on GitHub.
raw or chunked; 5) CL: the value of the Content-Length At last, we selected 19 state-of-the-art HTTP implementa-
header; 6) Order: the parameter that describes the order of tions written in different languages as our evaluation targets,
messages; 7) Status: the HTTP response status code. including integrated servers, cache servers, network frame-
We collect the first five types of states by manually inserting works, and application servers, detailed in Table 1. We test
customized code into the HTTP handling functions of the im- these implementations against their latest version at the time
plementations. The inserted code utilizes the internal API of of the experiment.
Table 1: Evaluation Targets and New Vulnerabilities Found

Category Name Version ME1 CGI Star2 Host2 Vuln. Status Inconsistency Attack CVE3 Severity

NGINX 1.25.2 ✓ ✓ 19.3K 20.1M - - - - - -


Apache 2.4.57 ✓ ✓ 3.3K 13.7M 4 Fixed Resp. TE.CL Resp. Forgery ✓ Low
Integrated
Lighttpd 1.4.70 ✓ ✓ 547 3.6M - - - - - -
Servers
H2O 2.2.6 ✓ ✓ 10.6K 2.1K 1 Fixed Number Parsing Req. Smuggling - N/A
HAProxy 2.8.2 ✓ ✓ 4.1K 30.0K - - - - - -
Squid 6.1 ✓ ✗ 1.8K 5.1M 1 Fixed Number Parsing Req. Smuggling ✓ Critical
Cache
Varnish 7.3.0 ✓ ✗ 3.4K 1.7M - - - - - -
Servers
ATS 9.2.0 ✓ ✗ 1.7K 126.2K 1 Fixed Trailer Section Req. Smuggling ✓ Critical
Twisted 23.8.0 ✓ ✗ 5.3K 78.2K 2 Fixed Resp. Order Resp. Stealing ✓ High
Confirmed Resp. TE.CL Resp. Forgery ✓ N/A
Tornado 6.3.2 ✗ ✓ 21.3K 144.3K - - - - - -
Network
gevent 23.7.0 ✗ ✓ 6.1K N/A 1 Fixed Trailer Section Req. Smuggling ✓ Critical
Frameworks
Eventlet 0.33.3 ✗ ✓ 1.2K N/A 3 Confirmed Number Parsing Req. Smuggling - N/A
Trailer Section Req. Smuggling - N/A
Req. TE.CL Req. Confusing - N/A
Tomcat 10.1.9 ✗ ✗ 7.0K 690.5K 1 Fixed4 Trailer Section Req. Smuggling ✓ High
Jetty 12.0.0 ✗ ✗ 3.7K 367.5K 1 Fixed Number Parsing Req. Smuggling ✓ Moderate
Puma 6.3.0 ✗ ✓ 7.5K N/A - - - - - -
Application
Falcon 0.42.3 ✗ ✓ 2.4K 2.2K 1 Fixed Number Parsing Req. Smuggling ✓ Moderate
Servers
uWSGI 2.0.21 ✗ ✓ 3.4K N/A - - - - - -
Waitress 2.1.2 ✗ ✓ 1.3K 20.4K - - - - - -
Gunicorn 21.2.0 ✗ ✓ 9.2K 175.1K 1 Reported Req. TE.CL Req. Confusing - N/A
1 ME: Support for hosting multiple endpoints pointing to different upstreams.
2 Data crawled from GitHub and Censys on Nov. 16th, 2023.
3 Every CVE except Twisted’s is assigned by its project maintainers. Twisted’s is assigned by MITRE with the maintainers’ permission.
4 $4660 bounty has been granted by Internet Bug Bounty to acknowledge our contribution.

HDH UNTER extracts the State Tuple by inserting code tions for each run. Our seed selection criteria is to collect
into targets to identify handling discrepancies. The statistics initial seeds that encompass a wide variety of HTTP methods
of the inserted lines, functions, and files are attached to Ap- and headers. This includes diversity in values, such as identi-
pendix D.1. An example of code insertion for Apache HTTPd ty/no encoding and chunked encoding. Additionally, we aim
is attached to Appendix D.2 to include seeds with varied body content types, such as URL
In addition to identifying discrepancies related to HTTP encoding, form data, JSON, and binary format.
requests, HDH UNTER has the exceptional ability to identify Experiment Platform Setup. We conduct our experiment
discrepancies in HTTP and CGI responses. Among 19 se- on a machine with Intel Core i7-1260P (up to 4.70GHz)
lected implementations, 9 support hosting multiple endpoints, CPU, 64GB RAM, and Ubuntu 22.04.4 LTS (6.5.0-41-generic
5 of which support CGI proxy. Since the threat model of x86_64) operating system.
HTTP Response Desync we previously defined can only ap-
ply to HTTP implementations with multiple endpoints, we
only set up response testing configurations for these imple- 5.2 Findings
mentations. We ran our HDH UNTER prototype on each pair within each
All implementations are configured to the single-thread configuration and removed the duplicates by utilizing the cov-
mode, in order to minimize the influence of multi-threading erage. We then manually analyzed each report by referring to
on coverage and path collection during execution. the State Tuple. We highlight 5 primary types of discrepan-
Ultimately, we established a total of 171 testing pairs for cies found by HDH UNTER, as detailed in Appendix E. The
HTTP requests, 36 for HTTP responses, and 10 for CGI re- Trailer Section and Response TE.CL are 2 novel discrepan-
sponses. We run our tool against each test pair for 12 hours cies we identified. Besides, we found new variants in other 3
and repeat for 5 times. For initial seed selection, we use tcp- types of discrepancies. In total, we identified 17 new HTTP
dump [21] to capture real-world traffic on our router and select Desync vulnerabilities, affecting well-known HTTP servers
20 raw HTTP messages covering different HTTP specifica- such as Apache, Tomcat, Squid, etc. We have disclosed these
vulnerabilities to their maintainers and received 9 CVE IDs, the first payload in Appendix G.2, Tomcat would interpret it
as shown in Table 1. as two requests while other implementations interpret it as
three.
5.2.1 Non-standard number parsing
5.2.3 Non-standard line separator
Number fields are common in HTTP protocol. Two impor-
tant number fields control the boundary of the message body: The standard line separator for HTTP is CRLF: a Carriage
the value of the Content-Length header and the chunk sizes. Return (\r) together with a Line Feed (\n). We found several
HTTP RFCs allow decimal digits in CL and hexadecimal HTTP implementations that allow the use of a single LF as
digits in the chunk sizes. However, we found 8 tested HTTP the line separator. This tolerance may result in inconsistent
implementations that allow other characters in these fields, number of messages between two implementations if one only
including 0x prefix, + prefix, _ between, and any suffixes. We allows CRLF and accepts LF in the chunk extension, while the
demonstrate the detailed acceptance of these characters in other allows both LF and CRLF. Please refer to Appendix G.2
Appendix F. This behavior results in inconsistent number and for a detailed payload and description. We have verified the
content of messages. payload on NGINX and Gunicorn.
For example, let us consider how Squid and H2O handle a
chunk with a 0x-prefixed size respectively. Squid processes 5.2.4 Different request TE.CL handling strategies
the chunk normally, but H2O interprets it as the last chunk,
Sending both TE and CL headers at the same time is a clas-
which indicates the end of the request, leading to a significant
sic way to introduce desynchronization. Most state-of-the-art
difference in their understanding of the message’s length and
HTTP implementations have taken measures to mitigate this
boundary. As a result, an intentionally crafted message may
issue. We found no implementation forwards these two head-
cause H2O to misinterpret it as two separate requests.
ers without sanitization. Some of the implementations reject
the request and respond with status 400. Others accept the
5.2.2 Inconsistent trailer section acceptance request, but they handle persistent connections with different
behaviors. Some implementations abort persistent connec-
Trailer sections enable the ability to send additional meta- tions upon receiving the request, while others maintain them.
data after sending contents in chunked messages. Despite be- It should be noted that uWSGI does not support chunked en-
ing an RFC standard, state-of-the-art HTTP implementations coding by default. It would read the request body according
show no interest in supporting it while handling it diversely. to the CL header.
We will discuss the inconsistencies in two stages: parsing and Eventlet and Gunicorn are two implementations that sup-
forwarding. port Python’s WSGI interface. They support the chunk-
In the parsing stage, some implementations, such as Apache encoded message body, but they also accept the CL header
HTTP Server and HAProxy, will accept trailer sections and and pass the value to the application through the CONTENT_LE
validate the format. They throw 400 when sending malformed NGTH environment variable. According to most CGI standards,
trailer sections. In contrast, implementations like NGINX ac- applications should obey the value of the CL environment
cept requests with trailer sections even if they are malformed. variable when processing the request body. However, devel-
Other implementations do not support trailer sections at all. opers of the applications may choose to read the whole HTTP
In the forwarding stage, most tested implementations ig- body without referring to it, resulting in inconsistent content
nore trailer sections because of the request sanitization. How- of messages.
ever, Apache Traffic Server does not sanitize the request and
forwards the original raw request with trailer sections.
5.2.5 Different response TE.CL handling strategies
For implementations that do not support trailer sections,
they may misinterpret the trailer section as other HTTP com- HTTP requests and responses differ only in the first line, so
ponents, leading to an inconsistent number of messages. We the classic threat model that confuses the request’s boundary
found that gevent, Eventlet, and Puma exhibit this issue. When using both TE and CL headers should also work for responses.
an HTTP request with a trailer section is sent, these implemen- We found plenty of discrepancies in their handling of HTTP
tations respond with two HTTP responses: the first is normal, and CGI responses.
and the second has a 400 status. Analysis shows that gevent When handling HTTP responses, the majority of implemen-
and Eventlet drop the first line of the trailer section and treat tations are consistent with the behavior of handling requests.
the rest payload as a new request. Puma skips two characters They accept the TE header and keep TE or CL when forward-
after the last chunk, leading to similar misinterpretation. ing. Some of them abort the persistent connections with the
Tomcat supports the trailer section but has problems pars- upstream server on receiving responses with both TE and
ing the trailer section if there’s no colon in the line. It will skip CL headers. Varnish throws 503 Backend Fetch Failed when
lines until one line has a colon. Therefore, when processing receiving such responses. H2O and Twisted unexpectedly
accept the CL header, which is inconsistent with their behav- 5.3.2 Request Confusing
ior when processing requests. Twisted even forwards both
TE and CL headers to the downstream client, which leads to Sometimes, boundary confusion cannot smuggle a new
inconsistent number of messages. request but can interfere with the application to retrieve the
As for CGI responses, five tested CGI proxies demonstrate proper content, resulting in condition bypass. We refer to this
three different behaviors. Lighttpd and HAProxy accept the attack as Request Confusing.
TE header and chunk-encoded message body. NGINX and
We can leverage the defeats, that accept chunked encoded
H2O accept the CL header. Apache does not refer to either
body while setting the value of the CONTENT_LENGTH envi-
CL or TE but forwards both TE and CL and the whole body
ronment variable according to the CL header in Gunicorn and
to the downstream client. By specifying a smaller CL or
Eventlet, to perform Request Confusing. According to the
constructing a chunked message body, inconsistent number of
standard of WSGI [15, 16], applications should not attempt
messages occurs between Apache and the downstream client.
to read more data than is specified by the CONTENT_LENG
We found that four of Apache’s mod_proxy CGI modules,
TH value, which seems to make this value the authoritative
including FastCGI, SCGI, uWSGI, and AJP, are affected.
request length. However, developers of the applications may
choose to read the whole input without referring to that value.
5.2.6 Other notable discrepancies We find a possible combination that can lead to Request Con-
We found that when sending two pipelined requests to fusing in Gunicorn and Flask, where Flask is deployed as a
Twisted, it will handle them simultaneously without waiting WSGI application.
for the response of the first request. If the first request takes Gunicorn accepts Chunked with a capitalized C as a valid
longer to process than the second request, the responses are chunked encoding indicator. Flask has a function to get the
disordered, resulting in inconsistent order of messages. content length of the request body. If the value of the HTTP
_TRANSFER_ENCODING environment variable is not exactly
5.3 Attacks chunked, it will return the value of CONTENT_LENGTH environ-
ment variable. Meanwhile, Flask will parse the whole as form
This section delves into the exploitation of target imple- data input without referring to CONTENT_LENGTH.
mentations by capitalizing on previously identified inconsis- Figure 7(b) demonstrates the attack scenario. After sending
tencies. We outline four distinct attack techniques, visually the payload that contains a malformed Chunked and a CL
represented in Figure 7. header with value 0, developers who use Flask can success-
fully get the form-encoded data through request.form, but
5.3.1 Request Smuggling the value of request.content_length is 0. Any code that
By leveraging inconsistent number of messages, we can uses that value will incorrectly assume that the length of the
perform the classic request smuggling attack to bypass the body is 0, resulting in potential condition bypasses.
authentication mechanism and execute any request.
Reverse proxies and cache servers can apply restrictions
and require authentication on certain request paths to protect
5.3.3 Response Stealing
sensitive APIs, such as the back-end management and com-
mand execution interfaces. Normally, the proxy will block a
direct request towards these paths. Response Stealing refers to an attack method where at-
However, when Tomcat, gevent, Eventlet, and Puma are tackers exploit discrepancies in response processing across
deployed as the upstream server of the reverse proxy that implementations to illicitly acquire a victim’s response. This
forwards the raw request without sanitization, an attacker can lead to significant issues such as privacy breaches or
can leverage their misbehavior of handling trailer sections to cookie hijacking.
bypass the access control mechanism. The disordered responses issue found in Twisted can be
We have practiced this attack on Apache Traffic Server leveraged to perform Response Stealing. As illustrated in
and gevent. ATS does not sanitize the request and forwards Figure 7(c), we assume there is a proxy that handles incoming
the raw request with trailer sections. We configured ATS to requests and forwards them in pipelined form to a Twisted
forward requests that target at /path1 to the upstream gevent server through a persistent connection. For instance, proxies
application. By sending payloads in Figure 7(a), an attacker employing the Net::HTTP::NB library from Perl to dispatch
can access /path2 on the gevent application, which cannot be requests are aligned with this requirement. The Twisted server
directly requested through ATS. serves as a shared web-hosting gateway that allows tenants to
Non-standard number parsing and line separators can also configure an endpoint pointed to their servers or applications.
be leveraged to facilitate request smuggling, as they result in By holding the request sent by himself, the attacker can steal
inconsistent number of messages. the first response that should have been sent to the victim.
POST /path1 HTTP/1.1
POST /path1 HTTP/1.1 Host: a.com
Host: a.com Transfer-Encoding: chunked
Transfer-Encoding: chunked Connection: keep-alive POST / HTTP/1.1
Connection: keep-alive 2 Transfer-Encoding: Chunked
a2 Connection: keep-alive
2 0 Content-Length: 0
a2 Header: value Content-Type: application/x-
0 www-form-urlencoded
Header: value POST /path2?a=:123
POST /path2?a=:123 HTTP/1.1 HTTP/1.1 14 request.content_length = 0
Host: a.com Host: a.com id=1'or sleep(1);### request.form["id"] = "1'or
Connection: close Connection: close 0 sleep(1);###"

Attacker Apache Traffic Server gevent Attacker Gunicorn Flask


(Recognize as one request (Recognize as two requests) (Pass CL value through (Body confusion between
and Forward Trailer Section) CONTENT_LENGTH env) content_length and form)
(a) Request Smuggling (CVE Assigned) (b) Request Confusing
Inconsistent number of requests Inconsistent content of requests
HTTP/1.1 200 OK
Content-Length: 0
Content-Type: text/html Connection: Keep-Alive
Content-Length: 0 Content-Type: text/html

① (Pipelined) ④ ⑥ Hold HTTP/1.1 200 OK


Content-Type: text/html
HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 7 Content-Length: 7
Attacker ⑨ ③ Attacker's Server forgery forgery

⑧ ⑦

② Perl Twisted ⑤
Net::HTTP::NB
Victim Benign Server Attacker-controlled Apache Downstream
CGI Application
(c) Response Stealing (CVE Assigned) (d) Response Forgery (CVE Assigned)
Inconsistent order of responses Inconsistent number of responses

Figure 7: Examples of HTTP Desync Attacks We Discovered

5.3.4 Response Forgery Table 2: Comparison Between Types of Discovered Discrep-


ancies
Victims’ responses can not only be stolen but also manipu-
lated. Response forgery is a type of attack in which an attacker Number Trailer Line Req. Resp.
creates a deceptive response to replace the benign one. Parsing Section Sep. TE.CL TE.CL
The improper handling of TE and CL headers in CGI re- ① ④
T-Reqs ③
sponses by Apache can also be exploited to execute a Re- Attacker ⑦

Attacker's Server
sponse Forgery attack. As shown in Figure 7(d), when the HDiff⑧ ⑥

attacker-controlled application sends a CGI response, which HDH UNTER Proxy Apache
contains a misleading CL header whose value is 0 as the Victim Benign Server
header and a deceptive response as the message body, Apache : Not capable; : Capable to find some types; : Capable.
does not refer to the CL, accepts the whole message body,
converts it into an HTTP response without sanitization, and
crepancies can cause at least one implementation of the pair
forwards them back to the downstream. If the downstream
denying an input and are difficult to exploit for HTTP Desync
and Apache hold a persistent connection and enable HTTP
attacks. Table 2 shows the comparison of discovered discrep-
pipelining, the encapsulated response will poison the response
ancies with these two tools. They are capable of identifying
queue and be forged as the second response, resulting in Re-
discrepancies caused by Line Separator and Request TE.CL,
sponse Forgery. In fact, the HTTP pipelining is not required
but missing the remaining discrepancies, including those per-
since the downstream may not clear the receiving buffer after
taining to Number Parsing and Trailer Section. To find out
processing a response. Consequently, after the second request
the underlying cause, we investigated their runtime logs. Our
is forwarded, any residual payload in the buffer is mistakenly
findings indicate that their generators are unable to generate
interpreted as its response.
a valid payload capable of triggering the discrepancies due
to a lack of coverage guidance. Additionally, their detectors
5.4 Comparison are unable to identify some types of discrepancies due to the
absence of internal states.
Comparing to Existing Tools. Before evaluating our tool, Furthermore, we performed a comparison experiment to
we used existing tools, including T-Reqs [24] and HDiff [45], find out how code coverage contributes to the discrepancy
to test our evaluation targets. T-Reqs by default uses three exploration process. We kept running T-Reqs, HDiff, and
distinct configurations to test the first line, headers, and bodies HDH UNTER and collecting the respective increment in the
respectively. We manually merged three configurations into number of covered edges after processing a baseline re-
one to cover all three parts. We ran both T-Reqs and HDiff quest. Figure 8 shows an example of coverage growth for
for 12 hours and 5 times. The evaluation results indicate that the Apache–NGINX and Apache–Tomcat combination. The
they indeed found some discrepancies. However, these dis- evaluation shows that HDH UNTER covered more edges than
Apache–NGINX Apache–Tomcat
be aware of. Furthermore, with coverage guidance, the traver-
Increment in Number of Covered Edges (edges)

Increment in Number of Covered Edges (edges)


1750 HDHunter 2000 HDHunter

1500 HDHunter (AFL’s Test Case Generation) sal depth is greater than that of black-box approaches.
HDHunter (AFL’s Test Case Generation) T-Reqs
T-Reqs
1250 1500 Taking number parsing for example, our fuzzer discovered
1000
vulnerabilities rooted in programming languages that many
750
1000
HDiff developers had overlooked. Another example is the trailer
500
HDiff section issue newly discovered by HDH UNTER, where our
500
tool identified a novel payload attached in Appendix G.2
250
that lacked a colon in a trailer field, which goes beyond the
0 0
0 1 2 3 4 5 6 7 8 9 10 11 12 0 1 2 3 4 5 6 7 8 9 10 11 12 traditional TE.CL threat model.
Time (hours) Time (hours)

Figure 8: Comparison of Covered Edges over Time when 6.2 Insights


Testing Apache–NGINX and Apache–Tomcat Combinations
We dive into the insights behind the vulnerabilities, which
can be classified into three categories.
T-Reqs and HDiff under the same circumstances, thereby in- Number Parsing. We discovered that five of the implemen-
creasing the likelihood of uncovering deeper vulnerabilities. tations we tested do not adhere to the HTTP RFCs regarding
Ablation Study. To evaluate the contributions of our de- number parsing, despite clear restrictions on permissible char-
sign, we conducted an ablation study by replacing two of the acters outlined in the RFCs. We identified that the primary
key components of our system — the test case generation and cause of this discrepancy lies in developers frequently relying
snapshot-based executor — with existing implementations. on built-in number parsers of programming languages, such
To assess the efficacy of HDH UNTER’s test case gener- as int() in Python. However, these parsers vary across dif-
ation, we replaced it with AFL’s original generation strate- ferent programming languages, each adhering to their unique
gies. We used the same set of seeds to run the experiment. standards for number parsing. This includes differing levels of
As illustrated in Figure 8, HDH UNTER with AFL’s test case support for various formats and base systems, like underscore
generation covers more edges than T-Reqs and HDiff, but separators and hexadecimal notation, as shown in Appendix F.
fewer edges compared with our HDH UNTER’s test case gen- Discrepancies among HTTP RFCs and the standards of dif-
eration. Additionally, the execution speed significantly de- ferent programming languages create unintentional semantic
creased when using AFL’s original generation, dropping from gap variations.
120 exe/s to 13 exe/s during the Apache–NGINX evaluation. Trailer Sections. We observed that many implementations
We discovered that the AFL’s byte-level mutations generated have different support for trailer sections, leading to HTTP
a substantial number of incomplete test cases, which caused Desync attacks. These features, being rarely used in HTTP
timeouts and significantly slowed down the fuzzing process. and relatively more complex to test, often escape thorough
We then assessed the performance of our snapshot-based scrutiny. However, with the help of coverage information
executor in comparison to the traditional full-restart approach guiding the generation of test cases, we can produce test cases
by measuring the executions per second (exe/s) across various capable of activating the erroneous branches associated with
test setups. In scenarios where full-restart operates efficiently, these least-used components or hidden in deep program logic,
such as the Apache–Squid combination, our framework in- thus uncovering vulnerabilities missed by previous research.
creased execution speeds significantly — boosting perfor- CGI Converting Issues. Another type of new vulnerability
mance from 1.2 exe/s to 62 exe/s, achieving an acceleration we found stems from the protocol conversion process between
factor of up to 52 times. For more time-intensive restart sce- CGI and HTTP. CGI protocols, which are binary-based or
narios, such as the Apache–Tomcat combination, the snapshot reliant on in-process communication, define message bound-
framework improved performance from 0.24 exe/s to 21 ex- aries using their own distinct mechanisms instead of relying
e/s, corresponding to an even higher acceleration ratio of on HTTP’s TE or CL headers. While certain CGI protocols
88. These results underscore the significant efficiency gains mandate that the gateway should not read data beyond what is
possible with our snapshot-based executor. specified by the CL if present, not all implementations follow
this rule. We found that headers declared in CGI responses
are often carried over into HTTP responses, leading to new
6 Discussion types of HTTP Response Desync.

6.1 Contribution of Fuzzing 6.3 Responsible Disclosure


In addition to enhancing the automation of the vulnera- We responsibly disclosed the vulnerabilities in Table 1 to
bility detection process, fuzzing can also identify previously the respective vendors. Falcon, Jetty, Squid, H2O, gevent,
unknown variants of vulnerabilities that developers may not Tomcat, ATS, Apache, and Twisted have confirmed and
patched the vulnerabilities, with 9 CVE IDs assigned. We ever, to support legacy HTTP/1.1 clients, HTTP proxies have
received a 4660 USD bounty from Internet Bug Bounty [23] implemented protocol downgrade features that can convert
for the Tomcat vulnerability. Eventlet’s maintainers confirmed messages to HTTP/1.1 format, potentially leading to HTTP
the issues and plan to fix them in future versions. Gunicorn’s Desync by exploiting their boundary understanding gap. Pre-
developers received our report, but we have not received fur- vious work [28] has systematically summarized HTTP/2
ther updates from them. Desync attacks. In addition, the complexity and flexibility
of HTTP/2’s features, such as multiplexing, stream prioriti-
zation, etc., introduce expanded attack surfaces that can be
6.4 Mitigation exploited for DoS attacks. To detect such threats, the test case
HTTP Desync is caused by the handling discrepancies be- generation and the detector need to be refactored, but other
tween implementations, which is a long-addressing problem. components and the workflow can be reused.
To mitigate HTTP Desync, we propose the following three Additionally, we acknowledge that developing
solutions: HDH UNTER requires considerable manual effort dur-
Deny Malformed Messages and Support Required Spec- ing code insertion to extract internal states and discrepancy
ifications. A major cause of HTTP Desync vulnerabilities is analysis. However, the code insertion is one-time work, and
deviations from the standards outlined in RFC documents, the tool can be run periodically in case software updates may
such as supporting non-standard number format and lack- introduce new vulnerabilities. So in the long term, we believe
ing support for trailer sections. Developers of HTTP han- the approach will have a good return on investment. Since
dlers should thoroughly understand the relevant RFCs and the Large Language Models (LLM) can read, understand,
ensure their implementations conform to these specifications and modify the source code, further research can utilize
by denying malformed messages and supporting required LLMs, like GPT [35], Gemini [12], LLaMA [1], etc., to
specifications. It is important to note, however, that some reduce human labor. Previous research [52] has leveraged
RFC guidelines are advisory rather than mandatory. While LLM to generate fuzz drivers. In our case, LLM can help
strict adherence to RFC specifications can significantly reduce locate the entry function of the HTTP handler by explaining
issues, it may not completely eliminate them. the functions within the project and subsequently filtering
Sanitize the Message Before Forwarding. Most HTTP out those that are associated with HTTP handling. For
Desync attack scenarios involve interaction between a proxy discrepancy analysis, targeted validation environments can
and a server. If the proxy forwards an unambiguous message be set up to validate the presence of existing vulnerability
to the server, it eliminates the potential for a discrepancy. types to reduce human labor.
Hence, it is crucial for the proxy to sanitize the message
before forwarding it. This process can be done by first pars- 7 Related Work
ing the original message and then reconstructing it anew for
forwarding. A significant number of state-of-the-art HTTP HTTP Desync. The concept of HTTP Desync was first
proxies have already adopted this practice and have lowered raised by Kettle [26]. The article extended the scope of tradi-
the occurrence of HTTP Desync, demonstrating its efficacy. tional HTTP Request Smuggling [32], the confusion between
Integrate Differential Fuzzing into the Development Content-Length and Transfer-Encoding, to the parsing
Workflow. Incorporating differential fuzzing into the imple- discrepancy of the TE header. There are more studies [27, 29]
mentation’s validation routine can greatly help locate and involving new variants of HTTP Desync, but they still focus
resolve handling discrepancies. This method should be used on the inconsistent number of HTTP requests interpreted be-
alongside unit testing in the development stage to ensure a tween HTTP proxies and servers. In contrast, the Request
more secure implementation. Confusing raised in our paper focuses on the inconsistent
content of messages interpreted between CGI gateways and
6.5 Limitation applications. Doyhenard [14] first raised HTTP Response
Desync. By controlling the responses, the attacker can ma-
Our methodology is not applicable to closed-source HTTP nipulate the HTTP response queue to inject crafted messages
services, as it is not feasible to gather coverage and internal into the HTTP pipeline. However, these studies mainly focus
state information externally. Nonetheless, new findings from on manually discovering HTTP Desync. Our work discovered
open-source implementations can serve as valuable references that Desync can further arise in CGI responses.
for testing whether similar issues exist in closed-source ser- Another response-related vulnerability, HTTP Response
vices. Splitting [2], is a type of Web application-level vulnerabil-
Moreover, our work applies only to HTTP/1.1 due to ity distinct from HTTP Desync. It involves the injection of
its specific input structure, mutation strategies, and detector. headers and bodies into a response by leveraging bugs in the
HTTP/2 and HTTP/3 have transformed into binary proto- Web application. In contrast, Response Forgery and Stealing
cols, eliminating the text-based parsing differences. How- are interpreter-level vulnerabilities, caused by the response
forwarding misbehavior of HTTP servers and CGI gateways. erage information, can provide immunity to such scenarios.
Previous research proposed different techniques to find In summary, the above work is targeting one single program-
HTTP Request Smuggling vulnerabilities. T-Reqs [24] pro- ming language. Witcher [47] first applied gray-box fuzzing to
posed a grammar-based fuzzer to identify discrepancies and SQL and command injection vulnerabilities and introduced
HRS vulnerabilities. HDiff [45] utilized the natural language the coverage collection for interpreted languages by inserting
processing techniques to extract rules from the RFC docu- code into the bytecode interpreter. By combining and enhanc-
ments to generate semantically diverse inputs in differential ing Witcher and SanitizerCoverage [46], we broaden our test
testing. Both of them are black-box fuzzing techniques with- scope to HTTP implementations in more languages. However,
out obtaining information from the test target, limiting their existing tools are not tailored to identify discrepancies across
effectiveness. HTTP Garden [25] leveraged the path informa- two or more HTTP implementations. Consequently, their ef-
tion to guide test case generation to discover HRS vulnerabili- fectiveness in detecting HTTP Desync vulnerabilities, which
ties. In this paper, we employ the gray-box fuzzing approaches are primarily caused by such discrepancies, is significantly
to discover the HTTP Desync vulnerabilities. With the help limited.
of coverage, we are able to find deeper vulnerabilities. More-
over, we systematically summarized the categories of HTTP
Desync, broadened the HTTP Desync to HTTP responses, and
8 Conclusion
introduced a snapshot-based execution framework to mitigate
In this paper, we expanded the scope of HTTP Desync
the impact of network state on results.
to both HTTP requests and responses, and proposed
Broadly speaking, HTTP Desync belongs to a family of a novel coverage-directed differential testing framework
“semantic gap” attacks. Similar inconsistency problems also HDH UNTER, capable of automatically identifying HTTP dis-
exist in other systems, such as email systems [6, 53], Web crepancies. We tested 19 state-of-the-art HTTP implemen-
application firewalls [50], cache systems [5, 30], CDN sys- tations using our HDH UNTER prototype. We highlighted 5
tems [7, 22, 31, 54], and Web applications [49]. types of discovered discrepancies and validated their ability
Gray-box Fuzzing. Gray-box Fuzzing employs a genetic to trigger HTTP Desync, resulting in 4 types of attacks in-
algorithm to guide the input generation and mutation based on cluding Request Smuggling, Request Confusing, Response
the internal states in order to enhance the overall effectiveness. Stealing, and Response Forgery. We responsibly disclosed
The most famous solutions are AFL [51] and its enhanced these vulnerabilities to relevant vendors. A total of 9 CVE
version AFL++ [19], which leverage the branch coverage col- IDs have been assigned.
lected during execution to guide the test process. LibAFL [20]
was developed by the maintainers of AFL++, where devel-
opers can easily reuse state-of-the-art fuzzing components to Acknowledgement
increase overall efficiency and only need to implement the
necessary ones that are required to achieve their objectives. We sincerely thank all anonymous reviewers and our shep-
We implemented HDH UNTER based on LibAFL, making it herd for their insightful and constructive feedback on improv-
easy to extend and co-operate with existing fuzzing solutions. ing the paper. This work is supported by the National Natural
AFLNet [38] utilized the server’s response code as feedback Science Foundation of China (grant #62272265).
to guide the fuzzing process for interactive network protocols.
NSFuzz [40] integrated static analysis to extract internal state
Ethical Considerations
variables as feedback. The remaining states of the network
stack can interfere with the testing process. Coverage-directed HDH UNTER targets open-source HTTP implementations.
differential testing has been widely applied to detecting log- We set up all the testing environments on our local machine
ical defects in various protocols, such as mucert [9] for SS- without interfering with the real-world servers and networks.
L/TLS and classfuzz [8] for JVM. Our work does not adopt Additionally, any identified vulnerabilities were privately re-
their Markov chain Monte Carlo (MCMC) algorithm, but ported through the project’s security email or the GitHub
to reuse the mutual AFL framework with augmentations for Security Advisory in the official repository. No details regard-
HTTP Desync. NEZHA [37] optimized the coverage feed- ing the vulnerabilities were disclosed to the public before the
back process of differential testing by introducing δ-diversity vulnerabilities were patched.
that synthesized the path information to guide the seed selec-
tion process. However, it does not apply to network protocols,
because the unstable cycle times and execution paths intro- Open Science
duced by the network’s asynchronous waiting and handling
can produce different δ-diversity when executing the same We have opened the source of HDH UNTER on GitHub
input. The combined edge map used by HDH UNTER, with (https://github.com/mukeran/HDHunter) and Zenodo
AFL’s hit-count bucket division and accumulated edge cov- (https://zenodo.org/records/14557763).
References [11] d3d. From akamai to f5 to ntlm... with love. https:
//blog.malicious.group/from-akamai-to-f5-t
[1] Meta AI. Introducing llama: A foundational, 65-billion- o-ntlm/, 2023.
parameter large language model. https://ai.meta.
com/blog/large-language-model-llama-meta-a [12] Google DeepMind. Gemini - google deepmind. https:
i/, 2023. //deepmind.google/technologies/gemini, 2024.

[2] Director of Security Amit Klein and Inc. Research, [13] defparam. Smuggler. https://github.com/defpara
Sanctum. Divide and conquer - http response split- m/smuggler, 2020.
ting, web cache poisoning attacks, and related topics.
https://repository.root-me.org/Exploitati [14] Martin Doyhenard. Response smuggling: Exploiting
on%20-%20Web/EN%20-%20HTTP%20Response%20Sp http/1.1 connections. https://media.defcon.org/D
litting%20-%20Divide%20and%20Conquer.pdf, EF%20CON%2029/DEF%20CON%2029%20presentatio
2005. ns/Martin%20Doyhenard%20-%20Response%20Smu
ggling-%20Pwning%20HTTP-1.1%20Connections
[3] Apache. The apache tomcat connectors - ajp protocol .pdf, 2021.
reference. https://tomcat.apache.org/connecto
rs-doc/ajp/ajpv13a.html, 2023. [15] Phillip J. Eby. Pep 333 – python web server gateway
interface v1.0. https://peps.python.org/pep-033
[4] Cornelius Aschermann, Tommaso Frassetto, Thorsten 3/, 2003.
Holz, Patrick Jauernig, Ahmad-Reza Sadeghi, and
Daniel Teuchert. NAUTILUS: fishing for deep bugs [16] Phillip J. Eby. Pep 3333 – python web server gateway
with grammars. In 26th Annual Network and Distributed interface v1.0.1. https://peps.python.org/pep-3
System Security Symposium (NDSS), San Diego, Cali- 333/, 2010.
fornia, USA, 2019. The Internet Society.
[17] Roy T. Fielding, Mark Nottingham, and Julian Reschke.
[5] Jianjun Chen, Jian Jiang, Haixin Duan, Nicholas Weaver, HTTP Semantics. RFC 9110, jun 2022.
Tao Wan, and Vern Paxson. Host of troubles: Multiple
host ambiguities in http implementations. In Proceed- [18] Roy T. Fielding, Mark Nottingham, and Julian Reschke.
ings of the 2016 ACM SIGSAC Conference on Computer HTTP/1.1. RFC 9112, jun 2022.
and Communications Security, pages 1516–1527, 2016.
[19] Andrea Fioraldi, Dominik Christian Maier, Heiko
[6] Jianjun Chen, Vern Paxson, and Jian Jiang. Composition
Eißfeldt, and Marc Heuse. AFL++ : Combining in-
kills: A case study of email sender authentication. In
cremental steps of fuzzing research. In 14th USENIX
29th USENIX Security Symposium (USENIX Security
Workshop on Offensive Technologies, (WOOT 2020).
20), pages 2183–2199, 2020.
USENIX Association, 2020.
[7] Jianjun Chen, Xiaofeng Zheng, Hai-Xin Duan, Jinjin
[20] Andrea Fioraldi, Dominik Christian Maier, Dongjia
Liang, Jian Jiang, Kang Li, Tao Wan, and Vern Paxson.
Zhang, and Davide Balzarotti. Libafl: A framework
Forwarding-loop attacks in content delivery networks.
to build modular and reusable fuzzers. In Proceedings
In NDSS, 2016.
of the 2022 ACM SIGSAC Conference on Computer
[8] Yuting Chen, Ting Su, Chengnian Sun, Zhendong Su, and Communications Security (CCS 2022), pages 1051–
and Jianjun Zhao. Coverage-directed differential testing 1065, Los Angeles, CA, USA, 2022. ACM.
of JVM implementations. In Proceedings of the 37th
ACM SIGPLAN Conference on Programming Language [21] The Tcpdump Group. Tcpdump. https://www.tcpd
Design and Implementation (PLDI 2016), pages 85–99, ump.org/, 1999.
Santa Barbara, CA, USA, 2016. ACM.
[22] Run Guo, Jianjun Chen, Yihang Wang, Keran Mu, Bao-
[9] Yuting Chen and Zhendong Su. Guided differential jun Liu, Xiang Li, Chao Zhang, Haixin Duan, and Jian-
testing of certificate validation in SSL/TLS implemen- ping Wu. Temporal {CDN-Convex} lens: A {CDN-
tations. In Proceedings of the 2015 10th Joint Meeting Assisted} practical pulsing {DDoS} attack. In 32nd
on Foundations of Software Engineering (ESEC/FSE USENIX Security Symposium (USENIX Security 23),
2015), pages 793–804, Bergamo, Italy, 2015. ACM. pages 6185–6202, 2023.

[10] The MITRE Corporation. Cve - cve. https://cve.mi [23] HackerOne. Internet bug bounty. https://hackeron
tre.org/, 1999. e.com/ibb, 2021.
[24] Bahruz Jabiyev, Steven Sprecher, Kaan Onarlioglu, and [37] Theofilos Petsios, Adrian Tang, Salvatore J. Stolfo, An-
Engin Kirda. T-reqs: HTTP request smuggling with gelos D. Keromytis, and Suman Jana. NEZHA: efficient
differential fuzzing. In CCS ’21: 2021 ACM SIGSAC domain-independent differential testing. In 2017 IEEE
Conference on Computer and Communications Secu- Symposium on Security and Privacy (SP 2017), pages
rity, pages 1805–1820, Virtual Event, Republic of Korea, 615–632, San Jose, CA, USA, 2017. IEEE Computer
2021. ACM. Society.
[25] Ben Kallus, Prashant Anantharaman, Michael Locasto, [38] Van-Thuan Pham, Marcel Böhme, and Abhik Roychoud-
and Sean W. Smith. The http garden: Discovering pars- hury. AFLNET: A greybox fuzzer for network protocols.
ing vulnerabilities in http/1.1 implementations by differ- In 13th IEEE International Conference on Software Test-
ential fuzzing of request streams, 2024. ing, Validation and Verification (ICST 2020), pages 460–
465, Porto, Portugal, 2020. IEEE.
[26] James Kettle. Http desync attacks: Request smuggling
reborn. https://portswigger.net/research/htt [39] PortSwigger. Http request smuggler. https://gith
p-desync-attacks-request-smuggling-reborn, ub.com/PortSwigger/http-request-smuggler,
2019. 2018.

[27] James Kettle. Http desync attacks: what happened next. [40] Shisong Qin, Fan Hu, Zheyu Ma, Bodong Zhao, Tingting
https://portswigger.net/research/http-des Yin, and Chao Zhang. Nsfuzz: Towards efficient and
ync-attacks-what-happened-next, 2019. state-aware network service fuzzing. ACM Trans. Softw.
Eng. Methodol., 32(6):160:1–160:26, 2023.
[28] James Kettle. Http/2: The sequel is always worse. http
s://portswigger.net/research/http2, 2021. [41] Rack. Rack specification. https://github.com/rac
k/rack/blob/main/SPEC.rdoc, 2008.
[29] Amit Klein. Http request smuggling in 2020–new vari-
ants, new defenses and new challenges, 2020. [42] Neil Schemenauer. scgi. https://github.com/nas
cheme/scgi, 2002.
[30] Yuejia Liang, Jianjun Chen, Run Guo, Kaiwen Shen,
[43] Sergej Schumilo, Cornelius Aschermann, Ali Abbasi,
Hui Jiang, Man Hou, Yue Yu, and Haixin Duan. In-
Simon Wörner, and Thorsten Holz. Nyx: Greybox hy-
ternet’s invisible enemy: Detecting and measuring web
pervisor fuzzing using fast snapshots and affine types.
cache poisoning in the wild. In Proceedings of the 2024
In 30th USENIX Security Symposium (USENIX Security
on ACM SIGSAC Conference on Computer and Com-
2021), pages 2597–2614. USENIX Association, 2021.
munications Security, pages 452–466, 2024.
[44] Sergej Schumilo, Cornelius Aschermann, Andrea Jem-
[31] Ziyu Lin, Zhiwei Lin, Ximeng Liu, Jianjun Chen, Run
mett, Ali Abbasi, and Thorsten Holz. Nyx-net: network
Guo, Cheng Chen, and Shaodong Xiao. {CDN} cannon:
fuzzing with incremental snapshots. In EuroSys ’22: Sev-
Exploiting {CDN}{Back-to-Origin} strategies for am-
enteenth European Conference on Computer Systems,
plification attacks. In 33rd USENIX Security Symposium
pages 166–180, Rennes, France, 2022. ACM.
(USENIX Security 24), pages 5717–5734, 2024.
[45] Kaiwen Shen, Jianyu Lu, Yaru Yang, Jianjun Chen,
[32] Chaim Linhart, Amit Klein, Ronen Heled, and Steve Mingming Zhang, Haixin Duan, Jia Zhang, and Xi-
Orrin. Http request smuggling. https://www.cgis aofeng Zheng. Hdiff: A semi-automatic framework
ecurity.com/lib/HTTP-Request-Smuggling.pdf, for discovering semantic gap attack in HTTP imple-
2005. mentations. In 52nd Annual IEEE/IFIP International
[33] LLVM. libfuzzer – a library for coverage-guided fuzz Conference on Dependable Systems and Networks (DSN
testing. https://llvm.org/docs/LibFuzzer.html, 2022), pages 1–13, Baltimore, MD, USA, 2022. IEEE.
2015. [46] The Clang Team. Sanitizercoverage. https://clang.
[34] Inc. Open Market. Fastcgi a high-performance web llvm.org/docs/SanitizerCoverage.html, 2014.
server interface. https://fastcgi-archives.gith [47] Erik Trickel, Fabio Pagani, Chang Zhu, Lukas Dresel,
ub.io/FastCGI_A_High-Performance_Web_Serve Giovanni Vigna, Christopher Kruegel, Ruoyu Wang,
r_Interface_FastCGI.html, 1996. Tiffany Bao, Yan Shoshitaishvili, and Adam Doupé.
[35] OpenAI. Chatgpt. https://chat.openai.com, 2022. Toss a fault to your witcher: Applying grey-box
coverage-guided mutational fuzzing to detect SQL and
[36] Anshuman Pattnaik. Http request smuggling detection command injection vulnerabilities. In 44th IEEE Sym-
tool. https://github.com/anshumanpattnaik/ht posium on Security and Privacy (SP 2023), pages 2658–
tp-request-smuggling, 2020. 2675, San Francisco, CA, USA, 2023. IEEE.
[48] uWSGI. The uwsgi protocol. https://uwsgi-doc 6 message - body = chunked - body / * OCTET
7 chunked - body = * chunk last - chunk trailer - section
s.readthedocs.io/en/latest/Protocol.html, ,→ CRLF
2012. 8 chunk = chunk - size [ chunk - ext ] CRLF chunk
,→ - data CRLF
9 chunk - size = 1* HEXDIG
[49] Enze Wang, Jianjun Chen, Wei Xie, Chuhan Wang, 10 last - chunk = 1*("0") [ chunk - ext ] CRLF
Yifei Gao, Zhenhua Wang, Haixin Duan, Yang Liu, and 11 trailer - section = * field - line
12
Baosheng Wang. Where urls become weapons: Auto- 13 SP = % x20
mated discovery of ssrf vulnerabilities in web applica- 14 HTAB = % x09
15 OWS = *( SP / HTAB )
tions. In 2024 IEEE Symposium on Security and Privacy 16 CRLF = % x0D % x0A
(SP), pages 216–216. IEEE Computer Society, 2024. 17 HEXDIG = DIGIT / "A" / "B" / "C" / "D" / "E"
,→ / "F"
[50] Qi Wang, Jianjun Chen, Zheyu Jiang, Run Guo, Ximeng
Liu, Chao Zhang, and Haixin Duan. Break the wall from
bottom: Automated discovery of protocol-level evasion B Detailed Mutation Strategies
vulnerabilities in web application firewalls. In 2024
IEEE Symposium on Security and Privacy (SP), pages
132–132, Los Alamitos, CA, USA, may 2024. IEEE Table 3: Mutation Strategies of HDH UNTER
Computer Society.
Level Strategy
[51] Michal Zalewski. American fuzzy lop, 2017.
1) Randomly select a message from corpus and
[52] Cen Zhang, Yaowen Zheng, Mingqiang Bai, Yeting Li, Sequence*
add to the sequence
Wei Ma, Xiaofei Xie, Yuekang Li, Limin Sun, and Yang 2) Delete a message from the sequence
Liu. How effective are they? exploring large language
model based fuzz driver generation. In Proceedings of 3) Duplicate a field line
4) Delete a field line
the 33rd ACM SIGSOFT International Symposium on
5) Randomly select field lines from corpus and
Software Testing and Analysis, pages 1223–1235, 2024.
insert them at a random position
[53] Jiahe Zhang, Jianjun Chen, Qi Wang, Hangyu Zhang, Message* 6) Randomly swap two values of the same type
7) Randomly replace a value with one random preset
Chuhan Wang, Jianwei Zhuge, and Haixin Duan. Inbox
token of the same type
invasion: Exploiting mime ambiguities to evade email at-
8) Randomly select field lines from corpus and
tachment detectors. In Proceedings of the 2024 on ACM replace the trailer section (chunked only)
SIGSAC Conference on Computer and Communications
Security, pages 467–481, 2024. 9) Randomly insert bytes
Byte 10) Randomly remove bytes
[54] Linkai Zheng, Xiang Li, Chuhan Wang, Run Guo, 11) Randomly duplicate bytes
Haixin Duan, Jianjun Chen, Chao Zhang, and Kaiwen 12) Randomly select bytes from corpus and insert
Shen. Reqsminer: Automated discovery of cdn forward- them at a random position
ing request inconsistencies with differential fuzzing. In * New mutation strategies introduced in HDH UNTER.
NDSS, 2024.
The detailed mutation strategies used by HDH UNTER are
A HTTP Grammar presented in Table 3.

HDH UNTER uses the ABNF rules in Listing 1, which are


manually extracted from RFC, to perform input mutations. C Detection Rules
The non-terminals without a rule are regarded as data fields,
i.e. *OCTET. Algorithm 1 is the logic of our HTTP Desync Detector.

Listing 1: ABNF Rules Used to Build the Structure of HTTP


Messages D Code Insertion
1 HTTP - message = start - line *( field - line CRLF )
,→ CRLF [ message - body ] D.1 Statistics
2 start - line = request - line / status - line
3 request - line = method SP request - target SP HTTP -
,→ version CRLF
Table 4 shows the number of lines, functions, and files we
4 status - line = HTTP - version SP status - code SP [ modified in 5 representative targets’ request handling pro-
,→ reason - phrase ] CRLF
5 field - line = field - name ":" OWS field - value OWS
cedures, covering 4 categories and 5 different programming
,→ CRLF languages.
Algorithm 1 HTTP Desync Detection Process: Determine Table 5: Different Types of Non-standard Number Parsing
whether there is an HTTP Desync, using State Tuples SA , SB
of implementation A and B Type Example Affected
procedure D ESYNC D ETECTION(SA ,SB )
if C OUNT(SA ) ̸= C OUNT(SB ) then Content-Length: 0 x8 Falcon
return true Falcon
for i ← 0 to C OUNT(SA ) do Transfer-Encoding: chunked
0x prefix Squid
if O RDER(SA ,i) ̸= O RDER(SB ,i) then
return true
0 x8 Eventlet
abcdefgh
if B ODY(SA ,i) ̸= B ODY(SB ,i) then Tornado
return true Jetty
if I S E RROR(S TATUS(SA ,i)) and I S E RROR(S TATUS(SB ,i)) Content-Length: +8
Eventlet
then
continue Transfer-Encoding: chunked
+ prefix Falcon
if E NCODING(SA ,i) ̸= E NCODING(SB ,i) then
return true
+8 Eventlet
abcdefgh
if CL(SA ,i) ̸= CL(SB ,i) then
return true Content-Length: 1 _0
if C ONSUMED(SA ,i) ̸= C ONSUMED(SB ,i) then
return true Transfer-Encoding: chunked
Falcon
return f alse _ between
1 _0 Tornado
abcdefgh
Table 4: Statistic of Inserted Code in 5 Representative Targets
Transfer-Encoding: chunked
H2O
Any suffixes
Target Line† Func File Hour Category Lang. 8 irrelavent_characters ATS
abcdefgh
Apache 19 4 4 2–3 Intergrated Server C
Squid 17 7 3 2–3 Cache Server C++
approximately 13500 discrepancies were reported during each
gevent 25 5 1 1–2 Network Framework Python
run. After the preliminary deduplication, around 500 discrep-
Tomcat 11 6 5 1–2 Application Server Java ancies remained.
Falcon 64 6 5 1–2 Application Server Ruby Figure 9 details 5 primary types of discrepancies found by

HDH UNTER.
Comments and empty lines are not included.

F Different Types of Non-standard Number


D.2 Example
Parsing
The apache.diff file in the repository refers to the diff of
the code we insert into Apache. This code extracts the first Non-standard number parsing is one of the primary dis-
five types of states for the HTTP requests handling process. crepancies between HTTP implementations discovered by
ap_process_http_async_connection is Apache’s HTTP HDH UNTER. We summarize their different behaviors in Ta-
handling function. After calling ap_read_request, Apache ble 5.
has parsed the request, and we can read and set Encoding and
CL. Count is incremented after the processing of the request.
ap_http_filter is responsible for parsing the HTTP body.
G HTTP Payloads
We locate the positions where Apache reads the body, includ-
ing the chunk size, content, and body content, and insert code G.1 Payloads as Artifact
to maintain Consumed and Body. Apache reuses the code To facilitate future research, the collection of the vulnerable
to read raw body and chunked body. ap_rgetline_core is payloads has been uploaded to the repository.
called when reading the start line and field lines.

G.2 Interesting Payloads


E Detailed Discrepancies
In this section, we share two interesting payloads found by
A substantial number of discrepancies were identified dur- HDH UNTER that can be leveraged to perform HTTP Desync
ing the experiment. For instance, in the Apache–Tomcat pair, Attacks.
NGINX Apache Lighttpd H2O HAProxy Squid Varnish ATS TwistedTornado gevent Eventlet Tomcat Jetty Puma Falcon uWSGI WaitressGunicorn
Number NP2
S S S NP1 S NP2 S NP1 S NP2 S S NP3 S NP2 S S NP2
Parsing NP3
Standard Loose chunk Loose chunk Loose content
S: behavior NP1: size suffix NP2: size format NP3: length format

Trailer TS1 S TS1 TS1 S TS1 TS1 TS1 TS2 TS2 TS3 TS3 TS4 S TS3 S TS5 TS1 S
Section*
Throw error on Tolerance of Not supported Not supported Supported but No support for
S: invalid format TS1: invalid format TS2: and throw error TS3: and ignored TS4: faulty TS5: chunked encoding

Line
LS1 S S LS1 LS1 LS1 LS1 LS1 S LS1 LS1 LS1 LS1 LS1 S S S S S
Separator
S: CRLF Only LS1: CRLF and LF

Request S RQ1 S RQ2 RQ1 RQ2 S RQ2 S S RQ2 RQ3 RQ1 S RQ2 S RQ4 RQ2 RQ3
TE.CL
Throw error on Accept TE and Accept TE and Accept TE and No support for
S: receive both RQ1: keep one and RQ2: keep one and RQ3: pass both RQ4: chunked encoding
disconnect proceed

Response S S S RS1 S
S RS3 S RS2 N/A
TE.CL*† RS1 RS4 S RS1 S
Accept TE and Accept CL and Accept CL and Throw error on Forward the No multiple
S: keep one RS1: keep one RS2: Forward both RS3: receive both RS4: whole body and N/A endpoints support
both TE and CL
* Novel types of discrepancies identified by HDH UNTER.
† The top and bottom halves of the five left-hand tiles of Response TE.CL represent discrepancies in HTTP responses and CGI responses respectively.

Figure 9: Five Primary Types of Discrepancies Between Implementations We Discovered

The first one (Listing 2) is an exploitable HTTP request extension, and implementation B allows both LF and CRLF.
smuggling payload in the tested version of Tomcat. Tomcat A will interpret lines 6–7 as the first chunk, lines 8–9 as the
will interpret this payload as two requests — with lines 1–15 second chunk, and lines 12–15 as the second request. B will
as the first and lines 17–19 as the second. The other HTTP interpret line 6 with the left 8 characters as the first chunk,
implementations, such as ATS, will interpret this payload line 8 as the trailer section, lines 9–11 as the second request,
differently — with lines 1–10 as the first and lines 12–19 as and drop lines 12–15 since the connection is set to close. The
the second. If a setup uses ATS as the proxy and Tomcat as discrepancy is caused by the confused boundary of the first
the server, the malicious payload embedded in the second chunk.
request can be executed on Tomcat. Listing 3: Payload that Leads to Discrepancy by Non-standard
Line Separator
Listing 2: Payload that Leads to Discrepancy in Tomcat
1 POST / proxy HTTP/1.1[CR][LF]
1 POST / benign_path HTTP/1.1
2 Host: a. com[CR][LF]
2 Host: a. com
3 Transfer-Encoding: chunked[CR][LF]
3 Connection: keep-alive
4 Connection: keep-alive[CR][LF]
4 Transfer-Encoding: chunked
5 [CR][LF]
5
6 0a;[LF][CR][LF]
6 5
7 12345678[LF]0[CR][LF]
7 12345
8 4b ;:123[CR][LF]
8 0
9 [CR][LF]POST / proxy HTTP/1.1[CR][LF]Host: b. com[CR]
9 Content : hello
[LF]Connection: close[CR][LF]Content-Length: 5
10 a
[CR][LF][CR][LF]
11
10 0[CR][LF]
12 POST / benign_path HTTP/1.1
11 [CR][LF]
13 Host: a. com
12 GET / proxy HTTP/1.1[CR][LF]
14 Connection: keep-alive
13 Host: a. com[CR][LF]
15 Content-Length: 37
14 Connection: close[CR][LF]
16
15 [CR][LF]
17 GET / evil_path HTTP/1.1
18 Any : any
19 Host: b. com
20 H Impact of Initial Seeds
21
Initial seeds play a vital role in the fuzzing process. In
The second one (Listing 3) is a payload that can lead to order to evaluate the impact of the initial seeds, a series of ex-
discrepancy by non-standard line separators. Assume imple- periments were conducted against the Apache–NGINX com-
mentation A only allows CRLF and accepts LF in the chunk bination. The experiments involved the use of 20 selected,
20 Selected Seeds (A)
Increment in Number of Covered Edges (edges)

1600 10 Selected Seeds (B)


1400 20 Random Seeds (C)
20 Selected Seeds (AFL) (D)
1200

1000

800

600

400

200

0
0 1 2 3 4 5 6 7 8 9 10 11 12
Time (hours)

Figure 10: Comparison of Covered Edges over Time when


Testing Apache–NGINX Using Four Different Sets of Initial
Seeds

10 selected, 20 random, and 20 selected seeds, respectively,


with the last environment using AFL’s test case generation,
denoted as A, B, C, D. To be specific, D shares the same set
of initial seeds with A. B is a subset of A which has lower
diversity. C is selected randomly from the network flow. The
coverage growth of these setups is illustrated in Figure 10.
The experiment results demonstrate that the size of the initial
seeds is not a contributing factor to the outcome. Instead, a
positive correlation exists between the diversity of the initial
seeds and coverage.

You might also like