0% found this document useful (0 votes)
77 views4 pages

A Large-Scale Parallel Fuzzing System: Yang Li, Chao Feng, Chaojing Tang

Fuzzy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views4 pages

A Large-Scale Parallel Fuzzing System: Yang Li, Chao Feng, Chaojing Tang

Fuzzy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

A Large-scale Parallel Fuzzing System

Yang Li, Chao Feng, Chaojing Tang


School of Electronic Science
National University of Defense Technology
Changsha 410073, China
LiYangllliy@163.com

ABSTRACT In general, the application of fuzzing to parallelization, especially


The application of parallelization to fuzzing can effectively for larger-scale distributed systems, is still limited. The most
improve test efficiency. But with the expansion of node scale, important problem is how to keep synchronization. Specifically, it
synchronization mechanism will become the bottleneck. To solve is how to coordinate each node to push forward test progress. A
this problem, this paper presents a method which is suitable for general solution is to jointly maintain a global path table through
large-scale parallelization to generate test cases. It simplifies the communication among nodes. When a node finds that a
execution path of tree form into a one-dimensional array by processing path has been executed, the node will no longer handle
preprocessing, which ensures validity and reduces processing time. it. Such a method is theoretically feasible. However, with the
This paper also designs and implements a parallel fuzzing system increase of the size of test program and the enlargement of node
using this method. The system uses a polling mechanism to scale, it is inevitable that the explosion of path space and the
reduce repetitive tasks. A jump-oriented strategy is adopted to increase of synchronization delay will be unavoidable.
reduce redundancy when filtering crashes. At the end of this paper, AFL [10] developed by Michal Zalewski is an efficient guided
the effectiveness of the system in improving the efficiency of fuzzing tool. It uses a jump-based simplification in dealing with
fuzzing is further demonstrated through experiments. synchronization. AFL simplifies the path of tree structure into a
linear structure consisting of many jumps between basic blocks.
CCS Concepts The size of the linear structure is fixed to 64KB [10]. This
• Security and privacy ➝Software and application security simplified method has a better effect when the number of nodes is
➝Software reverse engineering. less. But when node scale is large, Synchronization mechanism
will also become the bottleneck to restrict parallelization
Keywords efficiency. To apply fuzzing to large-scale distributed systems, I
Vulnerability discovery; Parallel fuzzing; Test case generation; think it needs to be improved in two ways: one is to improve the
Distributed computing. method of fuzzing to further reduce the size of the data used to
maintain synchronization, and the other is to improve the
1. INTRODUCTION efficiency of fuzzing by rational design of distributed system. This
Fuzzing is a kind of vulnerability exploration method which refers
article will introduce the above contents in the second and third
to sending various malformed data to target software to look for
sections.
hidden vulnerabilities. Different from traditional white-box
methods, fuzzing does not require analysis of program source 2. AN IMPROVED METHOD FOR
code. In contrast to traditional black-box testing which focuses on
only whether the output of test cases match the expected ones, GENERATING TEST CASES
fuzzing focuses only on whether a target software has an The method proposed in this paper is a simplified method for path
exception[1-9]. Testing methods vary according to target software, table, similar to the method used by AFL. But compared with
input format and operating platform. However, the basic AFL, it has a smaller linear structure and is more suitable for
procedures of fuzzing are the same. The steps are as follows: synchronization in large-scale distributed systems. The specific
determining input vectors, generating test cases, executing test process is as follows:
cases, anomaly detection and analysis. Although the principle of
fuzzing is simple, it often finds important vulnerabilities in many
2.1 Static Analysis
programs. In order to improve the efficiency of fuzzing, The purpose of static analysis is to conduct a preliminary
parallelization is undoubtedly a good way. Sharing the process of vulnerability analysis of program through some static analysis
one test into multiple CPU cores of multiple hosts at the same tools before an actual test is carried out. It is specifically divided
time can significantly improve efficiency and hardware utilization. into the following steps:
1) Find the jumps between basic blocks. That is, static analysis
Permission to make digital or hard copies of all or part of this work for tools are used to divide all the basic blocks of the program to be
personal or classroom use is granted without fee provided that copies are tested, and then record the addresses of basic blocks, then find the
not made or distributed for profit or commercial advantage and that possible jumps between the basic blocks. Number all the jumps
copies bear this notice and the full citation on the first page. To copy starting from zero, and then generate a table named tb1 to record
otherwise, or republish, to post on servers or to redistribute to lists,
the correspondence between the numbers and the jumps. The
requires prior specific permission and/or a fee. Request permissions from
Permissions@acm.org. partitioning of basic blocks and the discovery of jumps can be
ICAIP’18, June 16–18, 2018, Chengdu, China. automated with some open API tools such as IDA [11].
© 2018 Association for Computing Machinery.
2) Generate a one-dimensional array. Generate an array according
ACM 978-1-4503-6460-7/18/06…$15.00
to the sequence number in tb1, and it is named tem-array [*]. The
DOI: https://doi.org/10.1145/3239576.3239615

194
sequence number of the jump in tb1 is the index of the array The above method can be a long-term and accumulative
element, which completes the mapping relationship between the exploration of a program. It does not have to perform complex
jumps and the array. The size of the element is one byte. For computations, and does not need to compare all of the execution
example, four possible jumps of a program are numbered as 0, 1, paths, and avoids the impact of path explosion, which greatly
2 and 3, and then dynamically generated array elements are: tem- improves the execution efficiency.
array [0], tem-array [1], tem-array [2] and tem-array [3]. The size
of the generated array varies with the number of jumps in test 2.3 Further Filtering the Test Cases
program. The process in 2.2 can cope with most situations, but sometimes,
even if there is no new jump, there may be a new execution path.
3) Mark dangerous jumps. According to [12], static models such For example,
as stack overflow vulnerabilities, heap overflow vulnerabilities,
format string vulnerabilities, null pointer dereference tc1: A->B->C->D
vulnerabilities, and data leak vulnerabilities have all been
tc2: A->B->C->A->D
established. With these models, we can match a target program
with the above vulnerability models. If the matching is successful, tc3: A->B->C->A->B->C->D
it shows that the related basic blocks are more dangerous. Then
based on these dangerous blocks, find out all the corresponding After executing both tc1 and tc2, all jumps in tc3 have occurred,
dangerous jumps in preparation for the next step. but in fact tc3 is a completely new path. By recording the times of
each jump, we can tell the difference. But at the same time, the
2.2 Discovering New Jumps frequency of jump is not that important, especially when the times
Through the static analysis, we get the empty tem-array [*] (all are large. For example, when a program is in a loop, looping 200
the elements are 0) as a template. When testing, the template is times and looping 201 times often do not make much difference.
used to generate two arrays, total-array [*] and array [*], Therefore, a non-equidistance dynamic strategy is proposed in this
respectively. Total-array [*] is used to count up all the jumps of paper. Specifically, for different kinds of jumps, we use different
executed test cases. How the elements of total-array [*] are intervals of coarse and fine size to classify the frequency of each
handled will be explained in section 2.3. Array [*] is used to jump. We use finer-grained intervals for dangerous jumps and the
record the jumps of each test case when it is executed. Array [*] jumps not within marked range. For example, the times of each
will be updated in real time according to the progress of test, and jump may be divided into 8 intervals: 1, 2, 3, 4-7, 8-15, 16-31, 32-
then the number of jumps will be written to corresponding 127 and 128+. Each interval corresponds to a bit of the element of
position in array [*] (no longer upwards when greater than 255). total-array [*], which is equivalent to setting up 8 flag bits. If the
The new array [*] will be compared to total-array [*]. When a frequency of one jump is in a certain interval, the flag bit
new jump is found, the test case will be selected and total-array [*] representing the interval is set to 1. When testing, the flag bits of
will be updated accordingly. The following example illustrates total-array [*] will be updated accordingly. For example, the
this process: jumps of a test case have all appeared before. Jump A->B
In the process of executing the first test case tc1, jump A->B and appeared 17 times, but the corresponding flag bit is 0, which
B->C appear in turn, and then the program ends. The contents of means that the test case will be selected and the flag bit will be set
array [*] are marked accordingly. The jumps in total-array [*] is to 1.
the same as the jumps in array [*] of tc1. Then when tc2 is This method not only considers the shortcomings of simply
executed, the same jumps appear in turn, which means array [*] of relying on the discovery of new jumps, but also makes operations
tc2 is the same as total-array [*], so tc2 will be discarded and total- simple and practical by classifying the times of each jump. The
array [*] will remain unchanged. When tc3 is executed, jump A- processing flow is as follows:
>B and B->D appear in turn, and then the program ends. Since the
jump B->D has not been seen before, array [*] of tc3 is different
from total-array [*]. The result is that tc3 will be selected for
subsequent processing and total-array [*] will be updated. The
process is illustrated as follows:

Figure 1. The process of discovering new jumps

During the execution of a program, some jumps that are not in Figure 2. The processing flow of generating test cases
marked range may occur. When an unknown jump is presented, a
tag will be appended to tb1, and tem-array [*] will be updated.

195
3. THE DESIGN OF A LARGE-SCALE This polling mechanism can make each task node keep pace with
each other. It can reduce the repeated tasks of nodes and
PARALLEL FUZZING SYSTEM effectively reduce data conflicts, especially when the system is
3.1 System Structure large. The higher the computing performance of the master node
The structure of the system uses a Client - Server mode. The is, the smaller the polling interval is, which results in better
system sets up a single master node and many task nodes. The synchronization.
task nodes are independent of each other, and only interact with
the master node through a LAN. The structure of the system is as 3.3 Crash Management
follows: Crash deduplication is an essential issue for every fuzzing process,
and a usual practice is to use stack checksum filtering. But for the
same error, it may be triggered by different paths. Such as:
p1: A->B-> C->D
p2: A->C->B ->D
These two execution paths will all be triggered when the program
is executed to D. From the point of view of vulnerability
utilization, these two paths have different meanings and need to
be retained. But the stack checksum method will classify the two
Figure 3. The structure of the system crashes into one class and abandon one. To solve this problem,
and combined with the test case generation method presented
above, a strategy of examining the jumps between basic blocks is
The master node is responsible for preprocessing, dynamic adopted here. For the example just cited, there are jumps for these
scheduling, real-time monitoring, collecting data and aggregating two paths respectively:
results. The task node is responsible for maintaining a local task
p1: A->B, B->C, C->D
list, generating test cases, performing test cases, monitoring
abnormalities, accepting tasks, reporting results, and the like. p2: A->C, C->B, B->D
Each task node uses the same fuzzer, and each is tested
independently. These two paths will be separated and will not cause the problem
of mistaken deleting. The specific approach is to compare array [*]
3.2 Synchronization of Progress among Nodes of the test cases that are executed, and the crash generated by the
For a distributed system, each task node is independent of each same array [*] will be considered repeated.
other to do fuzzing. The variation of test cases is random. When The process of filtering crashes is carried out on the task nodes.
testing, it is likely that repetitive tasks among nodes will result in After a test, the information of crashes is summarized to the
low efficiency of this system, especially when the scale of the master node.
distributed system is large. If a global control can be carried out
for each task node with random variation, the utilization rate of 3.4 Processing Flow of the System
computing resources will be significantly improved. 1) The master node preprocesses the tasks to be tested, including
To solve this problem, I adopt a polling mechanism initiated by finding jumps, generating list tb1 and tem-array [*], Marking
the master node. The master node continuously polls each task dangerous jumps.
node, and takes out total-array [*] and tb1,and then combines all 2) The system is initialized. Each node completes the addition of
total-array [*] and tb1 to generate a new global-array [*] and a the task node by running a Python script, and downloads tasks and
new tb1. During the next polling of the master node, Total-array [*] related data from the master node.
of each node will be combined to global-array [*] and tb1 will also
be updated. The process of combining arrays is illustrated as 3) Each task node conducts independent testing, continuously
follows: generates new test cases. The test cases with dangerous jumps are
prioritized. Each time a new test case is generated, its array [*]
will be retained to filtering crashes simultaneously.
4) The master node polls each node, takes out the newly generated
total-array [*] and tb1, and updates global-array [*]. For the next
polling, Total-array [*] of each node will be replaced by global-
array [*] and tb1 will be updated too. Repeat 3) 4) until the test is
terminated.
5) When the test is terminated, the master node will take out
results from each task node in turn, and generate the global
information of crashes.

4. EXPERIMENT
Figure 4. The process of combining arrays
4.1 The Establishment of a Test Environment
The experiment is based on existing server resources in our
laboratory and builds a distributed environment with a certain
node scale on QEMU virtual machines. The master node and the

196
task nodes are in the same LAN, and parameters are shown in The results show that with the increase of the number of nodes,
Table 1. A web server is built on the master node. Each task node the crash time of the system we designed is getting shorter and
executes a Python script. The binary program to be tested and shorter. After more than 10 nodes, the efficiency of using disfuzz-
some initial test cases are copied into the designated location of afl has not been significantly improved. When the number of
the task node. nodes is 40, compared with disfuzz-afl, the execution time of our
system is shortened by about 35%. The fact shows that our system
Table 1. Configuration of the test environment has better performance in large-scale fuzzing.
Node type CPU RAM Hard disk OS
5. CONCLUSION
Master node 4 cores 4GB 40GB Ubuntu14.04 This paper focuses on improving efficiency of large-scale parallel
fuzzing environment. An improved method for generating test
Task node 1 core 1GB 10GB Ubuntu14.04
cases is proposed, and a parallel fuzzing system that uses this
method is designed and implemented. Finally the effectiveness of
4.2 The Process and the Results of the the system in improving the efficiency of fuzzing is further
Experiment demonstrated through an experiment. In the next work, we will
A target binary program is constructed with the following focus on the application of AI to vulnerability discovering in order
structure of jumps: to achieve better results.

6. REFERENCES
[1] Sutton, M., Greene, A., & Amini, P. 2007. Fuzzing: Brute
Force Vulnerability Discovery. Addison-Wesley Professional.
[2] Wang, T., Wei, T., Gu, G., & Zou, W. 2010. TaintScope: A
Checksum-Aware Directed Fuzzing Tool for Automatic
Software Vulnerability Detection. IEEE Symposium on
Security and Privacy (Vol.41, pp.497-512). IEEE Computer
Society.
[3] Miller, B. P., Cooksey, G., & Moore, F. 2006. An empirical
study of the robustness of macos applications using random
Figure 5. The structure of jumps in the target program
testing. Acm Sigops Operating Systems Review, 41(1), 78-
86.
The probability from the higher layer to the lower one is 1/256,
and the crash is triggered only if the program is executed to the [4] Kim, H. C., Choi, Y. H., & Dong, H. L. 2011. Efficient file
bottom. For distributed fuzzing systems, if the more tasks were fuzz testing using automated analysis of binary file format.
repeated, the probability of executing to the next layer would be Journal of Systems Architecture, 57(3), 259-268.
lower. In this experiment, the number of layers of the target [5] Ganesh, V., Leek, T., & Rinard, M. 2009. Taint-based
program is 5. directed whitebox fuzzing. IEEE, International Conference on
Software Engineering (pp.474-484). IEEE.
The experiment focuses on the execution time of the crash. As a
contrast, disfuzz-afl [13] is used here. Disfuzz-afl is developed by [6] Godefroid, P., Klarlund, N., & Sen, K. 2009. DART: directed
MartijnB and is a distributed fuzzing tool based on AFL. It sets up automated random testing. Hardware and Software:
a control node and several working nodes. Each node uses AFL to Verification and Testing. Springer Berlin Heidelberg.
fuzz, and upload processing results to control node every fixed [7] Molnar, D. A. 2009. Dynamic Test Generation for Large
time. Binary Programs. University of California at Berkeley.
The program is tested with the number of nodes 1, 5, 10, 20, 30, [8] Takanen, A., Demott, J., & Miller, C. 2008. Fuzzing for
40. The results of the experiment are as follows: Software Security Testing and Quality Assurance. Artech
House.
[9] Huang, S. K., Huang, M. H., Huang, P. Y., Lu, H. L., & Lai,
C. W. 2014. Software crash analysis for automatic exploit
generation on binary programs. IEEE Transactions on
Reliability, 63(1), 270-289.
[10] AFL. http://lcamtuf.coredump.cx/afl/.
[11] IDA. https://www.hex-rays.com/products/ida/.
[12] Wu, B., Li, M. J., Zhang, B., Feng, C., Zhang, Q., & Tang, C.
J. 2014. Software vulnerabilities detection based on random
programming. Applied Mechanics & Materials, 571-572,
553-558.
[13] Disfuzz-afl. https://github.com/MartijnB/disfuzz-afl.
Figure 6. The change of crash time with the
number of nodes

197

You might also like