This is an interpreter for the esoteric programming language Brainfuck written entirely in x86_64 Assembly!
bf-interpreter/
├── src/ # Source files (.s)
├── build/ # Build artifacts
├── Makefile # Build configuration
└── README.md # This file
- Direct syscalls
- Dynamic memory expansion (up to 2MB by default)
- Flexible command-line interface
- Jump table dispatch
- Robust error handling
- Flags support
BF has only 8 symbols in its syntax:
.: used to print the value of the current cell,: used to read a byte from user and store it into the current cell+: used to increases the value of the current cell-: used to decreases the value of the current cell<: used to move one cell to left>: used to move one cell to right[: used to open a loop]: used to close a loop
BF is based in cells, 8-bit sized memory space, to store values
Let's break the code and go through each block and feature:
.section .data: define global initialized variables.section .text: define global uninitialized variables
-
_start: the start function -
It gets
argcfrom the top of the stack memory and then the arguments -
This variable represents the count of arguments passed to the program in its execution
-
argv[i]represents theiargument passed in -
Either
argcandargv[i]has 8 bytes
Important
The arguments are stored in stack memory.
RSP represents the limit of the current stack frame.
If we tried to get this variables after a label call, it wouldn't work because RBP and RSP would point to another stack frame.
- Now, we load the memory address into
RDIregister; invalid_char(any other symbol, except the 8 provided by BF)- So, if the code has any other symbol but the 8 predefined, that'll bypassed
-
.fill_loop: used to filljmp_tablewith the symbol handlers -
Starts filling all spaces with
invalid_char -
Increments 8 bits in
jmp_tablefor each iteration -
Calls loop that will be executed
RCXtimes (256)- In it, we store each handler in
RBXand move its pointer into the correct offset of thejmp_table
- In it, we store each handler in
leaq cmd_plus(%rip), %rbx
movq %rbx, 0x2B*8(%rax) # 0x2B (+) * 8 -> 0x158 + jmp_table base addressImportant
jmp_table (pointed by R14) will represent the ASCII table. That's why it has 256 spaces
We must multiply the symbol by 8 to determine the start correct value
e.g.: 0x00 * 8 = 0x00 -> will start at 0x00
- Now, we can call the pointer by symbol
-
parse_args_loop: loop to parse all arguments passed in -
In each iteration, it decreases the value of
argc(subtracted by 1 to not count program's name)- if this value reach zero, the loop will be finished
-
RDIstores the value of current argument- Now, its first byte is compared with
-- if it ain't present there, jump to a callback to treat this value as file name
- The value is increased to get the flag
- The code has 4 flags:
f: used to pass the file name (can be passed without a flag as well);c: used to pass inline code, without a file;m: used to pass the maximum value of cells;h: used to get help.
- Now, its first byte is compared with
-
use_inline: called label for when inline code is in use -
In here, a
mmapis created (just like in file use) -
The next step is copy the symbols into the
mmapcopy_inline_code_to_mmap: expectsRSI(base of the inline code) andRDI(base of themmap)- in each iteration, the value stored in
RSIis passed toRDI. At the final, both are increased
- in each iteration, the value stored in
- In return, it will jump to
create_cell_mmap
-
If
has_fileglobal variable is1andhas_inlineis0, the code will try to open the file (its name is stored infilenameglobal variable) withload_filelabel -
load_file- It calls the syscall 2 (SYS_open) to open file by its name
- The file descriptor is stored into
R12register
- The file descriptor is stored into
- Now, we call the syscall number 5 (SYS_fstat) only to get the file size with ease and store it into
R13register - Now, we create a new
mmapto point to that file descriptor (external file)
- It calls the syscall 2 (SYS_open) to open file by its name
-
create_cell_mmap: creates a new mmap to store all program cells- By default, it maximum size is 2MB, but it can be increased by passing
-m VALUEat execution
- By default, it maximum size is 2MB, but it can be increased by passing
-
create_bracket_mmap: creates a mmap to store all brackets index- It has 8 times the code
mmaplength because we will useqwordsto store about 2**64 indexes- So, if there's a
[attape_base_code + 0x4000Findex, we can access it usingbracket_mmap_base + 0x4000F
- So, if there's a
- It has 8 times the code
Loop logic:
-
The interpreter preprocess all brackets before interpret the code
-
preprocess_bracketsR12: used as indexR15: used to store the current value ofRSPRSI: used to store the base of themmapcode
-
.preprocess_loop:EAX: gets the current symbol and fill remaining bits with zero
-
.open_bracket:- Only pushes
R12
- Only pushes
-
.close_bracket:-
Verifies if
RSPis the same asR15(storedRSP)- If so, no bracket were open
-
Pops the last bracket and moves it into
RAX -
Moves the value of the opening bracket to the current one (closing) in
brackt mmap -
Moves the value of the closining bracket (current one) to the opening one in
bracket mmap -
Bidirectional logic:
[=]and]=[
-
interpret_loop: label to interpret BF codeR15: will receive the current index of codeRDX: will receive the base of codemmap- the sum between them will result in the current symbol of code
Note
Note that if we multiply the value of the current symbol by 8, we will get the start offset of its pointer handler stored in jmp_table
- now, jump to the label by calling an indirect register (`*%rbp`)
continue_loop:- Increases the index value and updated its global variable
- Compares it with the size
- if lower, go back to loop
- if zero (equal), call a label to end the program
Symbol labels
-
cmd_plusandcmd_minus:- Sums or subtracts, respectively,
symbol_index_acumulatorto/from the value of the current cell- This value represents the acumulator for the same symbol
- e.g.:
++++++:+ 6
- e.g.:
- It is gotten when
interpret_loopjump to a label to calculate it- This label just counts the length of the sequence with same symbols
- This value represents the acumulator for the same symbol
- This is much better than just increases/decreases for every symbol
- Sums or subtracts, respectively,
-
cmd_dot:- Calls syscall number 1 (SYS_write) to print the value of the current cell
-
cmd_right:- Compares if the current cell + mmap base is greater or equal than the size of the
mmapitself- If so, the limit was reached and it must be increased
- Sums
symbol_index_acumulatoras well
- Compares if the current cell + mmap base is greater or equal than the size of the
Warning
The cell mmap starts with 4KB and can be increased up to 2MB by default
-
cmd_left:- Almost the same as
cmd_right, but subtractssymbol_index_acumulatorfrom cell index value
- Almost the same as
-
cmd_comma:- Calls the syscall number 0 (SYS_read) to read a byte from user and store it in the current cell value
-
cmd_osqbr(open square bracket):- Verifies if the current cell value is zero
- If it isn't, it will just jump to
continue_loop - If so, it must be jump to its final
- If it isn't, it will just jump to
- Verifies if the current cell value is zero
-
.jmp_to_final_of_loop:- Gets the index of the closing bracket using bracket
mmap+ current index (bidirectional value) and put this value into code index - Jump to
continue_loop, where its value will be increased (next symbol)
- Gets the index of the closing bracket using bracket
-
cmd_csqbr(close square bracket):-
Only gets the value of the opening bracket and puts it into index of code
- Always returns
-
Jumps to
interpret_loop -
It makes a loop that will be stoped only if the current cell is zero
-
expand_right:- It gets the middle of the
mmap(right part) and sums to it - Calls the syscall number 25 (SYS_mremap) to resize the cell
mmapwith its sum
- It gets the middle of the
Note
The mmap will be increased to right
-
Updates the base and the size of the global variables
-
expand_left:-
The same resize length
-
In here, SYS_mremap would not work because we need to expand to left.
- But there's not possible resizing it (only expands to right)
-
A new
mmapis created with the new size -
A label is called to copy the values from old to the new
mmapusing a shift expand variable (left part) -
Make some updates
- index = index + shift expand
-
Initially, the loop logic was different. It used a cached system as well, but not too optimized as the current one.
Let's see it
Instead of use a global mmap for cache brackets, before it, it used a cache in runtime.
When the current symbol was a [ or a ], the logic was almost the same, with some differences:
-
cmd_osqbr:- The index of the current bracket was pushed in stack memory
- It verified if the current cell value was zero
- If so, it jumped to a label to find the closing correct bracket (will be explained)
-
cmd_csqbr:- Poped the value from stack and added it to code index
- Added the current index into a global variable called
last_closed_bracket(the cache) - Returned to loop
- It had the same behavior that the current logic
-
.jmp_to_final_of_loop:- Verified if
last_closed_bracketwas zero- If so, jumped to a label to find the correct closing bracket using a loop
- If it is not, used the value stored in
last_closed_bracketto pass to the current code index
- Return to interpret loop
- Verified if
In the beginning, it worked with low level of loops, but not with large files.
Build
makeRun
make runDebug
make debugClean
make clean./bf -f program.bf # Run file
./bf -c "+++[>++." # Inline code
./bf -m 4096 -f program.bf # Custom memory limit
./bf -h # HelpTip
You can find some example code files in here
usage_example.mp4
mandelbrot.mp4
That's it!