Overview and Setup
Project Overview
In this work, we are going to practice the knowledge we have learnt to build an advanced grey-
box fuzzer step-by-step. The name of the fuzzer is Mini Lop, which is a
salute
to the legendary American Fuzzy Lop. Here's a picture for a real mini lop:
Mini Lop is a Python-based fuzzing framework. It is compatible with AFL's instrumentation.
This means that Mini Lop can interact with test targets which are instrumented by AFL's
compilers through the forkserver and shared memory. It is a modularised framework with the
following file structure:
main.py
conf.py the main entry of the fuzzer, controlling the entire
fuzzing process
the module for handling the fuzzing session configurations
execution.py
the module for handling executing the program under test (keeping track of exit
status and execution time)
feedback.py
the module for handling execution feedbacks (analysing the basic-block
transition/edge coverage and checking for crashes)
libc.py
the module for calling libc functions via Python (You don't need to
modify this file!)
mutation.py
the module for mutating the seed inputs to generate new test inputs
schedule.py
the module for doing the seed prioritisation and power scheduling
seed.py
the module for the seed class (You may need to modify this file to include more
properties for the seed, such as file size)
https://github.com/google/AFL/blob/master/afl-fuzz.c#L4746
Task Overview
The framework of Mini Lop only handles some key features likes managing the entire fuzzing
loop, setting up forkserver and shared memory, etc. It has some dummy features written as
stubs. Your task is to replace the stubs with actual implementations of the logic and use your
implementation to test two targets:
mjs
https://github.com/cesanta/mjs a lightweight Javascript
engine running on IoT devices
cJSON
https://github.com/DaveGamble/cJSON
a popular json file parser
Source Code
You can find the source code of Mini Lop in the files given to you
Docker
You will be working with a docker image that has every dependency installed.
Install Docker
Please follow this official guide to install docker: https://docs.docker.com/get-started/getdocker/
Pull the Image and Run the Container
Docker Image Structure
/mini-lop/
This is the folder to put the code of Mini Lop. The original framework code is already
included in this folder.
You will need to put your fuzzer's code into this folder for this work
submission. Do not change the file name of main.py .
/targets/
This is the folder containing the binaries of the test targets. They are already
compiled with AFL. So you don't have to worry about the compilation. Note that
mjs was not compiled with AddressSanitizer (ASAN), but cJSON was.
/seeds/
This is the folder containing the initial seed test inputs for the two targets. You
don't need to modify them!
/configs/
This is the folder containing the configuration files needed for fuzzing with Mini Lop.
You don't need to modify them.
/outputs/
This is the default folder specified by the configurations to output the fuzzing results
(test inputs kept as seeds and test inputs that can trigger crashes).
Command to Run Fuzzing
Just run the following command in the docker container to fuzz with Mini Lop:
Configuration File
Unlike AFL, which accepts several options, Mini Lop uses a configuration file to manage the
fuzzing process options. Sample configuration files are provided and you don't need to modify
them for this work. This is just to explain the items in the configuration files.
# the path to the initial seed test inputs seeds_folder = '/seeds/mjs'
# the path to store the fuzzing outputs output_folder = '/outputs/mjs'
# the path to the program under test's binary target =
'/targets/mjs_main_afl_cfast'
# the commandline option for the program under test
# @@ is the place holder for the input file path, same as AFL target_args = ['@@']
Target Programs
mJS
mJS is a small Javascript engine for IoT devices. The input files of mJS are JavaScript file. We
use an old version of mJS here, which is full of bugs. So with mJS, we focus on evaluating
your fuzzers' performance of bug detection.
cJSON
cJSON is a popular JSON file parser. It is being extensively fuzzed by Google's OSS-Fuzz
project. The implementation should be quite robust. So with cJSON, we focus on evaluating
your fuzzers' performance of code coverage.
However, if you can find bugs in cJSON, please post on the course forum. For each new bug
confirmed by the course staffs, the first student to detect it will get a bonus of 3 marks,
with total marks of this work capped at 60. The bug must not have been reported by
others in the Github issues of cJSON's repo.
Features to Implement
Feature 1: Analyse Code Coverage (10 marks)
Problem
The default implementation of Mini Lop is not truly a grey-box fuzzer. Why? Because it cannot
keep good test inputs as new seeds. It just randomly keep test inputs as seeds.
Task
Your task is to implement the feature of keeping the test inputs that can cover new basicblock
transitions/edges (new IDs of bytes in the shared memory) as new seeds in the queue.
Relevant Code
The relevant code is in feedback.py :
Note that the relevant code is just for you to start looking into the problem, you can modify
any code as you like.
for i in raw_bitmap:
# TODO: maintain a global coverage of all seeds, check if this s if i != 0: total_hits += 1
print(f'covered {total_hits} edges')
Hints
Each byte in the shared memory (the trace_bits variable) corresponds to an edge.
If the value of the byte is 0, then it is not covered by the target program during a
particular run. Otherwise, it is covered.
In Lecture-10, we learnt how AFL's intrumentation determines which edge should be
written to which byte.
You will need to maintain a global state to keep track of all the edges that the fuzzer has
covered.
You can then compare the global state with the trace_bits related to a test input to
check if that test input triggers new coverage
You need to save a file of the kept seed in the folder specified by
conf['queue_folder']
This is similar to how you keep the crashes
Do not store the content of the seed files in Seed class objects, because this
may consume too much of memory!
Just read from the files whenever you need access to the seeds' content Just
store file paths in memory
Self-Validation
You can print out the total coverage that your fuzzer has achieved.
If your implementation can have a better overall coverage than the default
implementation (without keeping new seeds), then you are likely to have
implemented this feature correctly.
Feature 2: Save the Crash Input (10 marks)
Problem
Now Mini Lop is a grey-box fuzzer. But it lacks an important ability: detecting crashes and
keeping the relevant test inputs as proof-of-concepts for the bugs.
Task
Your task is to implement the feature of storing the test inputs that can trigger the target
program to crash in the folder specified by conf['crashes_folder'] .
Relevant Code
The relevant code is in main.py :
Note that the relevant code is just for you to start looking into the problem, you can modify
any code as you like.
if check_crash(status_code): print(f"Found a crash, status code is {status_code}")
# TODO: save the crashing input continue
Hints
You can print out the findings of crashes (optional) to
make Mini Lop human friendly
Mini Lop uses the value of Linux Signals to check for anomalies.
It's an implicit test oracle (taught in Lecture-12).
The input files that can trigger crashes should be kept in the output/crashes folder
even after the fuzzing process
Because these files can help with debugging (in real practices). But we don't need to
care about debugging the target program here in this course.
You can run /targets/mjs_main_afl_cfast $CRASH_FILE to check if the kept
inputs can really crash mjs .
An input file refers to the test input created based on a seed using the mutation
operators.
Self-Validation
If you have implemented feature 1 and 5, then your fuzzer is very likely to be able to find a
lot of crashes in mjs .
You can validate this feature by checking whether the crashing inputs are kept.
If you would like to validate this feature before implementing feature 1 and 5, then you
could mock some crashes by randomly returning status code 11 during execution.
realted code: in execution.py
Feature 3: Seed Prioritisation Strategy (10 marks)
Problem
The default implementation of Mini Lop just randomly select the next seed test input to
generate new test inputs with. However, some seeds should be selected first because they are
better than others with smaller file size and more edge coverage.
Task
Your task is to implement the feature of selecting the favoured seeds like how AFL works to
perform seed prioritisation. For each cycle of looping through the seed queue, if there are
favoured seed not yet used in this cycle, the fuzzer should have a high chance of skipping the
current seed if it's not favoured. The algorithm of calculating the favoured seed is described in
Lecture-10.
https://github.com/google/AFL/blob/master/afl-cmin
Relevant Code
The relevant code is in schedule.py :
Note that the relevant code is just for you to start looking into the problem, you can
modify any code as you like.
Hints
You may need to keep track of the cycle information, where a cycle means visiting every
seed in the queue once.
You are suggested to make sure that your seed prioritisation strategy can use the
none-favoured seeds from time to time. This is to avoid starving some potentially
good seeds.
You need to update seed.py to keep track of the file size of each seed. Self-Validation
This feature only improves efficiency. It doesn't improve effectiveness.
This means that your fuzzer could produce higher coverage at the start of the fuzzing
session.
You can validate this similar to how you valid feature 1.
Feature 4: Power Scheduling Strategy (10 marks)
Problem
After selecting a seed test input, the default implementation of Mini Lop just a random number
of new test inputs with that seed. However, some seeds should be given more chances
because they are better than others with faster execution speed and more edge coverage.
Task
Your task is to implement the feature of power scheduling similar to how AFL works. A sample
algorithm of calculating the power schedule is described in Lecture-10. You don't have to
implement in exactly the same way. But your implementation must consider at least the
execution speed and edge coverage of the seeds.
https://github.com/google/AFL/blob/master/afl-fuzz.c#L4746
Relevant Code
The relevant code is in schedule.py :
def get_power_schedule(seed):
# this is a dummy implementation, it just returns a random number
# TODO: implement the power schedule similar to AFL (should consider return random.randint(1, 10)
Note that the relevant code is just for you to start looking into the problem, you can
modify any code as you like.
Hints
You can calculate a score for each seed, and convert the score into the actual number of
new test inputs generated from the seed.
The conversion could be as simple as something like: chance = score/100 .
Self-Validation
You can just follow AFL's implementation for this feature.
You can validate this similar to how you valid feature 1.
Feature 5: The Havoc Mutation Operator (10 marks)
Problem
The default implementation of Mini Lop just uses a very simple mutation operator to mutate the
seeds. It just randomly select certain bytes from the seed and flip some bits in that byte from 0
to 1 or from 1 to 0.
Task
Your task is to implement a better havoc mutator by adding more strategies. Your mutation
strategy should be as diverse as possible. But it should at least have the following features:
randomly select a short int/int/long int (2/4/8 bytes) from the seed file, and add/sub a
random value randomly replace a short int/int/long int (2/4/8 bytes) with an interesting
value (min/max/0/-1/1).
randomly replace a random length chunk of bytes in the seed input with another chunk in
the same file.
Relevant Code
The relevant code is in mutation.py :
def havoc_mutation(conf, seed):
# this is a dummy implementation, it just randomly flips some bytes
# TODO: implement the havoc mutation similar to AFL with open(seed.path,
'rb') as f: data = bytearray(f.read()) data_len = len(data)
Note that the relevant code is just for you to start looking into the problem, you can
modify any code as you like.
Hints
Every time when you have generated the content of the test input, you should always write
it to the file specified by conf['current_input'] to properly pass the test input to the
target program, like how it is done in the default implementation.
You can try out more advanced features to get bonus marks.
If you can implement the dictionary-based mutation feature properly, you can
earn 3 more marks.
Dictionaries are like keywords for structured test inputs.
You will need to modify the configuration handling of Mini Lop to specify a file
as the dictionary.
Then you can read from that file the dictionary tokens.
And then you can create mutation operators by inserting the dictionary tokens
or replacing bytes with the tokens.
Different targets should use different dictionaries.
Sample dictionaries used by AFL are here.
Self-Validation
This feature can improve both efficiency and effectiveness.
You can validate this similar to how you valid feature 1.
You may also expect the fuzzer to be able to find more crashes on mjs if you have
implemented this feature correctly.
Feature 6: The Splice Mutation Operator (10 marks)
Problem
The default implementation of Mini Lop is not truly following the genetic algorithm design (to be
taught in week 9). It doesn't do crossovers across different seeds.
Task
Your task is to implement the splice mutator. The splice mutation operator works like this:
Given a seed, select another seed in the queue (must not be the same one). - For each of the
two seeds, randomly splice it into 2 halves. Use the first half of the first seed and the second
half of the second seed to generate a new test input. - Apply havoc mutator to the newly
generated test input. - Yield the test input for execution.
Relevant Code
The relevant code is in mutation.py . You should implement a new mutator. Hints
After implementing this mutator, you should have two mutators in mutation.py .
You can randomly select which mutator to use in main.py here. You can also
try to implement an advanced mutation operator selection strategy (the
Epsilon-Greedy Strategy, lecture 16) to earn 3 bonus marks.
Each mutation operator is like an arm of the bandit machine.
The reward could be the number of new seeds detected with a mutation
operator.
You can use whatever value for epsilon. Just don't make it too large (>0.5) or too
small (<0.01).
Self-Validation
Similar to Feature 5.
Feature 7: Option-Sensitive Mutator for cJSON (Bonus Feature, 4
marks)
Problem
The source code of the target for cJSON is here:
https://github.com/DaveGamble/cJSON/blob/master/fuzzing/cjsonreadfuzzer.c
It's a specially crafted fuzz driver that uses the first 4 bytes of the test input to determine the
options used for parsing. In our previous mutators, we just mutate the entire file as a whole.
However, as we will learn in week 8 (combinatorial testing), the combinations of options could
be systematically tested.
Task
Your task is to implement a special mutator to mutate the 4 option-bytes for favoured seeds.
This mutator should just target the first 4 bytes of the input. It should be able to perform 2way
combinatorial testing (Lecutre-14).
Relevant Code
The relevant code is in mutation.py . You should implement a new mutator. Hints
This mutator should be applied on the favoured seeds.
You need to implement this mutation operator such that it is compatible the mutation
operator selection strategy in Feature 6.
You need to add an option in the configuration to turn on/off this feature so that it's only
enabled for the cJSON target.
Self-Validation
Similar to Feature 5 and 6.
Submission Guidelines
For each required feature and bonus feature in this work, students should prepare separate
markdown files documenting the implementation details and code modifications.
The filenames and structure for each item should be as follows:
1. Complete Code Implementation
mini-lop Folder: Include the complete mini-lop folder with all code implementations.
If you use new libraries (other than toml and sysv-ipc ), please also include a
requirements.txt file in the folder.
You can generate this file with pip freeze > requirements.txt
2. Feature Implementation Description Files
Feature Explanation Files :
Each feature should be documented in its own markdown file that includes:
- The relevant function-level code modifications - You can list the functions and
files you modified/created
- Explanation of the implementation - A brief explanation of 1-2 paragraphs is
enough if you have implemented the feature with code - If you haven't implemented
the feature with code, you need to write very detailed descriptions if you want to
earn partial marks (up to 50% of the feature).
Please use the following filenames:
Feature 1 (Analyse Code Coverage): Filename:
feature1_implementation.md
Feature 2 (Save the Crash Input): Filename:
feature2_implementation.md
Feature 3 (Seed Prioritisation Strategy): Filename:
feature3_implementation.md
Feature 4 (Power Scheduling Strategy): Filename:
feature4_implementation.md
Feature 5 (The Havoc Mutation Operator): Filename:
feature5_implementation.md
Feature 6 (The Splice Mutation Operator): Filename:
feature6_implementation.md
3. Bonus Feature Documentation (Optional
forBonus Marks)
bonus_implementations Folder (optional) :
Create a bonus_implementations folder to store markdown documentation for
each bonus feature implemented. Each file should contain explanations and
implementation details for the bonus feature. Use the following filenames:
Bonus 1 (New cJSON Bug): Filename: cjson_bug_discovery.md
If new bugs are found in cJSON, place the corresponding seed files in the
cJSON_bugs/ subfolder within bonus_implementations and
document the full command to trigger the bugs in
cjson_bug_discovery.md .
Bonus 2 (Dictionary-Based Mutation): Filename:
dictionary_mutation.md
Bonus 3 (Epsilon-Greedy Strategy): Filename:
epsilon_greedy_strategy.md
Bonus 4 (Option-Sensitive Mutator for cJSON): Filename:
option_sensitive_mutator.md
Summary of Submission
1. mini-lop/ (complete code folder)
2. feature1_implementation.md
3. feature2_implementation.md
4. feature3_implementation.md
5. feature4_implementation.md
6. feature5_implementation.md
7. feature6_implementation.md
8. bonus_implementations/ (optional)
cjson_bug_discovery.md
cJSON_bugs/
dictionary_mutation.md
epsilon_greedy_strategy.md
option_sensitive_mutator.md
Final Submission
ass2.zip (containing all the above files and folders)
Important Note
Only the ZIP archive ass2.zip should be submitted. Ensure all required files are correctly
named and included in the ZIP archive before submission.