Fuzz Testng
Fuzz Testng
Fuzz Testing
of recent vulnerabilities in Wireshark (http://www.wireshark.org),
a network protocol analyzer, were found by fuzzing. Large orga-
nizations are taking note. For example, Microsoft includes fuzz
testing as part of its Security Development Lifecycle (http://
for Software
www.microsoft.com/security/sdl/default.aspx).
A fuzzing tool, or fuzzer, consists of several components and
a fuzzing process involves several steps [4]. First, a generator
produces test inputs. Second, the test inputs are delivered to
Assurance
the system under test. The delivery mechanism depends on the
type of input that the system processes. For example, a delivery
mechanism for a command-line application is different from one
for a web application. Third, the system under test is monitored
for crashes and other basic undesirable behavior.
CrossTalk—March/April 2015 35
TEST AND DIAGNOSTICS
Fuzz testing is effective for finding vulnerabilities because instead there must be monitoring for crashes or other generally
most modern programs have extremely large input spaces, while undesirable behavior. However, many types of weaknesses do
test coverage of that space is comparatively small [5]. not produce clearly undesirable behavior. Therefore, more so-
While static source code analysis or manual review are not phisticated detection that test input caused a failure can signifi-
applicable to systems where source code is not available, fuzz cantly expand the classes of weaknesses uncovered by fuzzing.
testing may be used. Fuzz testing is a general technique and The following Section describes an example of using dynamic
therefore may be included in other testing tools and techniques analysis tools to detect a weakness that does not cause a crash
such as web application scanners [6]. under normal operation.
Fuzz testing has a number of limitations [7]. First, exhaustive
testing is infeasible for a reasonably large program. Therefore, Protocol Testing Experiment
typical software testing, including fuzz testing, cannot be used to The Heartbleed bug is a widely known vulnerability in
provide a complete picture of the overall security, quality or ef- OpenSSL, a popular implementation of the cryptographic proto-
fectiveness of a program in any environment. Second, it is hard cols Secure Sockets Layer (SSL) and Transport Layer Security
to exercise the program thoroughly without detailed understand- (TLS). Briefly, under the Heartbeat protocol, the client sends a
ing, so fuzz testing may often be limited to finding shallow weak- message and the message length to the server, and the server
nesses with few preconditions. Third, finding out what weakness echoes back the message.
in code caused the crash may be a time-consuming process. Fi- The Heartbleed vulnerability can be exploited to leak con-
nally, fuzz testing is harder to apply to categories of weaknesses, fidential information, including passwords and authentication
such as buffer over-reads, that do not cause program crashes. data. It was caused by the failure of OpenSSL to validate the
message length, which caused Buffer over-read weakness [10].
Fuzzing Approaches For more details, an interested reader can examine Heartbit,
Test input generation can be as simple as creating a se- an abstracted version of the OpenSSL code demonstrating the
quence of random data [3]. This approach does not work well Heartbleed vulnerability [11]. Even though buffer overflow, which
for programs that expect structured input. No matter how many includes buffer over-read, is a well-known weakness, software
tests are generated, the vast majority might only exercise a assurance tools missed it [12].
small validation routine that checks for valid input. Simple fuzz testing, which looks for crashes, would not have
In regression testing, valid inputs may be collected, for example, detected Heartbleed. The reason is that buffer over-reads rarely
from historical databases of unusual inputs that caused errors lead to program crashes. However, fuzz testing in combination
in the past versions of the software, and then supplied to the with a memory error detection tool, may have detected Heart-
program without modification. Such approach can help uncover a bleed, as demonstrated in [13].
weakness that reoccurs between versions or implementations, but Memory error detection tools, such as Valgrind (http://val-
is unlikely to uncover new weaknesses. grind.org) and AddressSanitizer (http://code.google.com/p/
Most fuzz generators can be divided into two major categories: address-sanitizer), are a type of dynamic analysis tools that can
mutation based and generation based fuzzers [8]. A mutation be used to instrument code to detect various memory errors,
based fuzzer produces test inputs by making random changes such as buffer overflows and use-after-free errors that may not
to valid test input, such as those from regression testing. This cause a crash under normal operation.
approach can be quickly applied to systems, such as protocols In the first experiment, [13] ran a vulnerable version of
or word processors, that accept complex inputs. However, the OpenSSL with Valgrind. When the fuzzer sent an exploiting
coverage is only as strong as the set of valid test inputs. If there is Heartbleed request, Valgrind produced an error trace highlight-
no valid test input for a particular system component, the mutation ing the bug. In the second experiment, a vulnerable version of
based fuzzer is unlikely to cover this component. OpenSSL was compiled with the AddressSanitizer compiler
A generation based fuzzer produces test inputs based on option. When an exploiting Heartbleed request was sent to the
some specification of the input format. While implement- server, it terminated and an error trace was produced. In both
ing the input format in enough detail requires a significant experiments, a programmer could use the error trace to find the
upfront effort, the generation based fuzzer can achieve very Heartbleed bug.
high coverage at lower cost.
A relatively recent approach, whitebox fuzz testing, combines Conclusions
symbolic execution with constraint solving to construct new Typical software testing, including fuzz testing, cannot be used
inputs to a program [9]. Whitebox fuzzing has been used by Mi- alone to produce bug-free software. Since fuzz testing does not
crosoft to find one third of all the bugs discovered by file fuzzing require a sophisticated oracle, it can quickly test a very large
during the development of Windows 7. number of unexpected inputs. When combined with appropriate
The next step after producing test inputs is providing them to supplemental tools, this makes it possible to find security vulner-
the system under test. Some common delivery mechanisms are abilities, such as the Heartbleed bug, which may be missed by
files, environment variables, command line and API parameters, and other tools. As demonstrated by a large number of bugs recently
operating system events, such as mouse and keyboard events.
Fuzz testing does not require knowing the expected output,
36 CrossTalk—March/April 2015
TEST AND DIAGNOSTICS
E-mail: efong@nist.gov
CrossTalk—March/April 2015 37