A Gentle Introduction to eBPF https://www.infoq.com/articles/gentle-linux-ebpf-introduc...
Last chance to save $50 for QCon Plus (May 17-28). Book your spot before the early bird ends on
May 1st.
A Gentle Introduction to eBPF
Key Takeaways
eBPF is a mechanism for Linux applications to execute code in Linux kernel space. eBPF
has already been used to create programs for networking, debugging, tracing, firewalls, and
more.
eBPF can run sandboxed programs in the Linux kernel without changing kernel source
code or loading kernel modules.
Several complex components are involved in the functioning of eBPF programs and their
execution.
Teleport, an identity-aware access proxy, is an example of an open source project using
eBPF. It can be used to collect events from SSH sessions such as network connections,
filesystem changes, etc.
In this article, we will review what eBPF is, what it does, and how it works. Then, we will
explain how to execute an eBPF program and provide an example of eBPF in action. Finally, we
will conclude with recommendations for next steps.
eBPF lets programmers execute custom bytecode within the kernel without having to change the
kernel or load kernel modules. Exciting? Maybe not yet.
What is eBPF?
Linux divides its memory into two distinct areas: kernel space and user space. Kernel space is
where the core of the operating system resides. It has full and unrestricted access to all hardware
— memory, storage, CPU, etc. Due to the privileged nature of kernel access, kernel space is
protected and allows to run only the most trusted code, which includes the kernel itself and
various device drivers.
1 of 12 5/4/21, 08:38
A Gentle Introduction to eBPF https://www.infoq.com/articles/gentle-linux-ebpf-introduc...
Last chance to save $50 for QCon Plus (May 17-28). Book your spot before the early bird ends on
May 1st.
space application must talk to the kernel space network card driver via a kernel API referred to as
“system calls”.
While the system call interface is sufficient in most cases, developers may need more flexibility
to add support for new hardware, implement new filesystems, or even custom system calls. For
this to be possible, there must be a way for programmers to extend the base kernel without
adding directly to the kernel source code. Linux Kernel Modules(LKMs) serve this function.
Unlike system calls, whereby requests traverse from user space to kernel space, LKMs are loaded
directly into the kernel. Perhaps the most valuable feature of LKMs is that they can be loaded at
runtime, removing the need to recompile the entire kernel and reboot the machine each time a
new kernel module is required.
Figure 1 - LKMs can be dynamically loaded and unloaded as part of kernel space (Source)
2 of 12 5/4/21, 08:38
A Gentle Introduction to eBPF https://www.infoq.com/articles/gentle-linux-ebpf-introduc...
Last chance to save $50 for QCon Plus (May 17-28). Book your spot before the early bird ends on
May 1st.
Kernel Services in the picture above, separating user space programs and preventing them from
messing with finely tuned hardware.
In other words, LKMs can make the kernel crash. Additionally, and aside from the wide blast
radius of security vulnerabilities, modules incur a large maintenance cost in that kernel version
upgrades can break them.
What does eBPF do?
eBPF is a more recent mechanism for writing code to be executed in the Linux kernel space that
has already been used to create programs for networking, debugging, tracing, firewalls, and
more.
Born out of a need for better Linux tracing tools, eBPF drew inspiration from dtrace, a dynamic
tracing tool available primarily for the Solaris and BSD operating systems. Unlike dtrace, Linux
could not get a global overview of running systems, since it was limited to specific frameworks
for system calls, library calls, and functions. Building on the Berkeley Packet Filter (BPF), a tool
for writing packer-filtering code using an in-kernel VM, a small group of engineers began to
extend the BPF backend to provide a similar set of features as dtrace. eBPF was born.
First released in limited capacity in 2014 with Linux 3.18, making full use of eBPF requires at
least Linux 4.4 or above.
3 of 12 5/4/21, 08:38
A Gentle Introduction to eBPF https://www.infoq.com/articles/gentle-linux-ebpf-introduc...
Last chance to save $50 for QCon Plus (May 17-28). Book your spot before the early bird ends on
May 1st.
Figure 2 - Simplified eBPF architecture
In Figure 2, we see a simplified visualization of eBPF architecture.
eBPF allows regular userspace applications to package the logic to be executed within the Linux
kernel as a bytecode. These are called eBPF programs and they are produced by eBPF compiler
toolchain called BCC. eBPF programs are invoked by the kernel when certain events, called
hooks, happen. Examples of such hooks include system calls, network events, and others.
Before being loaded into the kernel, an eBPF program must pass a certain set of checks.
Verification involves executing the eBPF program within a virtual machine. Doing so allows
the verifier, with 10,000+ lines of code, to perform a series of checks. The verifier will traverse
the potential paths the eBPF program may take when executed in the kernel, making sure the
program does indeed run to completion without any looping, which would cause a kernel lockup.
Other checks, from valid register state and program size to out of bound jumps, are also carried
through.
From the outset, eBPF sets itself apart from LKMs with important safety controls in place. Only
if all checks pass, the eBPFprogram is loaded and compiled into the kernel and starts waiting for
the right hook. Once triggered, the bytecode executes.
4 of 12 5/4/21, 08:38
A Gentle Introduction to eBPF https://www.infoq.com/articles/gentle-linux-ebpf-introduc...
Last chance to save $50 for QCon Plus (May 17-28). Book your spot before the early bird ends on
May 1st.
resources with minimal risk for the kernel.
How does eBPF work?
So far, I’ve reduced eBPF to its bare architecture, but there are more components working
together, each of which has layers of complexity of their own.
Dissecting an eBPF program
Events and Hooking
As we have already covered, eBPF programs execute in an event-driven environment. They are
triggered by kernel hooks. The diversity of hook locations is one of the many aspects that makes
eBPF so useful. A quick sampling of these include:
System Calls - Inserted when user space functions transfer execution to the kernel
Function Entry and Exit - Intercepts calls to pre-existing functions
Network Events - Executes when packets are received
Kprobes and uprobes - Attach to probes for kernel or user functions
Helper Functions
When eBPF programs are triggered at their hook points, they can call helper functions. These
special functions are what makes eBPF feature-rich. For example, helpers can perform a wide
variety of tasks:
Search, update, and delete key-value pairs in tables
Generate a pseudo-random number
Collect and flag tunnel metadata
Chain eBPF programs together, known as tail calls
Perform tasks with sockets, like binding, retrieve cookies, redirect packets, etc.
5 of 12 5/4/21, 08:38
A Gentle Introduction to eBPF https://www.infoq.com/articles/gentle-linux-ebpf-introduc...
Last chance to save $50 for QCon Plus (May 17-28). Book your spot before the early bird ends on
May 1st.
eBPF Maps
eBPF maps allow eBPF programs to keep state between invocations and to share data with the
user-space applications. An eBPF map is basically a key-value store, where values are generally
treated as binary blobs of arbitrary data.
They are created using the `bpf_cmd` syscall with BPF_MAP_CREATE parameter and, as
everything else in the Linux world, they are addressed via a file descriptor. Interacting with a
map happens through lookup/update/delete syscalls as shown here.
Executing an eBPF Program
Building eBPF Programs
The kernel expects all eBPF programs to be loaded as bytecode, so we need a way to create the
bytecode using higher-level languages. The most popular toolchain for writing and debugging
eBPF programs is called BPF Compiler Collection (BCC) and it is based on LLVM and CLang.
Just-In-Time (JIT) Compiler
After verification, eBPF bytecode is just-in-time (JIT) compiled into native machine code. eBPF
has a modern design, meaning it has been upgraded to be 64-bit encoded with 11 total registers.
This closely maps eBPF to hardware for x86_64, ARM, and arm64 architecture, amongst others.
Fast compilation at runtime makes it possible for eBPF to remain performant even as it must first
pass through a VM.
6 of 12 5/4/21, 08:38
A Gentle Introduction to eBPF https://www.infoq.com/articles/gentle-linux-ebpf-introduc...
Last chance to save $50 for QCon Plus (May 17-28). Book your spot before the early bird ends on
May 1st.
Figure 3- eBPF Architecture
The important takeaway here is understanding that eBPF unlocks access to kernel level events
without the typical restrictions found when changing kernel code directly. Summarizing, eBPF
works by:
Compiling eBPF programs into bytecode
Verifying programs execute safely in a VM before being being loaded at the hook point
Attaching programs to hook points within the kernel that are triggered by specified events
Compiling at runtime for maximum efficiency
Calling helper functions to manipulate data when a program is triggered
Using maps (key-value pairs) to share data between the user space and kernel space and for
keeping state.
eBPF in Action
Teleport is an open source multi-protocol identity-aware access proxy. It provides a convenient
and secure way of accessing SSH servers, Kubernetes clusters, databases and other resources
behind NAT, think cloud-native replacement for OpenSSH.
One of the project goals was to provide the detailed audit log of what actually happens during
SSH sessions. To achieve that, Teleport logs the following data:
7 of 12 5/4/21, 08:38
A Gentle Introduction to eBPF https://www.infoq.com/articles/gentle-linux-ebpf-introduc...
Last chance to save $50 for QCon Plus (May 17-28). Book your spot before the early bird ends on
May 1st.
The interactive sessions can show what a user was typing in her terminal during an interactive
session. Let’s say she executed a bash script and the recording will show this. But the recording
will not show if any file system changes took place, or whether the script downloaded or
uploaded any data to/from this machine.
That’s what the JSON event log is for, and Teleport uses eBPF to “spy” on user’s actions during
interactive SSH sessions. Consider, for example, the command:
echo Y3VybCBodHRwOi8vd3d3LmV4YW1wbGUuY29tCg== | base64 --decode | sh
Even though we can capture this command as printed out in the terminal, it means nothing to us
as the user has obfuscated the command that is piped into sh by encoding it in base64. But by
looking into the JSON log, we learn the user was attempting to obfuscate curl:
{
"event": "session.command",
"path": "/bin/sh",
"program": "sh",
"argv": [],
"login": "centos",
"user": "jsmith"
}
{
"event": "session.command",
"path": "/bin/base64",
"program": "base64",
"argv": [
"--decode"
],
"login": "centos",
"user": "jsmith"
}
8 of 12 5/4/21, 08:38
A Gentle Introduction to eBPF https://www.infoq.com/articles/gentle-linux-ebpf-introduc...
Last chance to save $50 for QCon Plus (May 17-28). Book your spot before the early bird ends on
May 1st.
"argv": [
"http://www.example.com"
],
"program": "curl",
"return_code": 0,
"login": "centos",
"user": "jsmith"
}
{
"event": "session.network",
"program": "curl",
"src_addr": "172.31.43.104",
"dst_addr": "93.184.216.34",
"dst_port": 80,
"login": "centos",
"user": "jsmith",
"version": 4
}
How did Teleport collect these events? By installing eBPF hooks at the beginning of the SSH
session. Specifically, it uses three BPF programs to get this data: execsnoop to capture the script
execution, opensnoop to capture files opened by the script, and tcpconnect to capture TCP
connections established during the session.
Let’s focus on tcpconnnect, which gives us the information in the final JSON object:
9 of 12 5/4/21, 08:38
A Gentle Introduction to eBPF https://www.infoq.com/articles/gentle-linux-ebpf-introduc...
Last chance to save $50 for QCon Plus (May 17-28). Book your spot before the early bird ends on
May 1st.
the connect() syscall, which initiates a connection on a socket. To trace these
entries, tcpconnect inserts a kprobe into the kernel to dynamically break into any
routine. Kprobe collects debugging and performance information non-disruptively and can be
inserted on virtually any instruction in the kernel.
BPF b = BPF(text=bpf_text) b.attach_kprobe(event="tcp_v4_connect",
fn_name="trace_connect_entry") b.attach_kretprobe(event="tcp_v4_connect",
fn_name="trace_connect_v4_return")
When the program is triggered along the code path, tcpconnect will start outputting
information. The table below exemplifies some of this information.
# ./tcpconnect
PID COMM SADDR DADDR DPORT
-----------------------------------------------------
2315 curl 172.31.43.104 93.184.216.34 80
All this data has been collected using helper functions. In fact, when we look at the
(Python) code, we can see tcpconnect using helper functions from the bcc’s BPF library to
format the information outputted above.
...
struct ipv4_data_t data4 = {.pid = pid, .ip = ipver};
data4.saddr = skp->__sk_common.skc_rcv_saddr;
data4.daddr = skp->__sk_common.skc_daddr;
data4.dport = ntohs(dport);
bpf_get_current_comm(&data4.task, sizeof(data4.task));
...
Where to go from here
10 of 12 5/4/21, 08:38
A Gentle Introduction to eBPF https://www.infoq.com/articles/gentle-linux-ebpf-introduc...
Last chance to save $50 for QCon Plus (May 17-28). Book your spot before the early bird ends on
May 1st.
Read more about using BPF To Transform SSH Sessions into Structured Events
BCC - “BCC is a toolkit for creating efficient kernel tracing and manipulation programs,
and includes several useful tools and examples […] BCC makes BPF programs easier to
write, with kernel instrumentation in C (and includes a C wrapper around LLVM), and
front-ends in Python and lua. It is suited for many tasks, including performance analysis
and network traffic control.” BCC also provides an API for other programs to use.
bpftrace - “BPFtrace is a high-level tracing language [that] uses LLVM as a backend to
compile scripts to BPF-bytecode and makes use of BCC for interacting with the Linux BPF
system, as well as existing Linux tracing capabilities: kernel dynamic tracing (kprobes),
user-level dynamic tracing (uprobes), and tracepoints.”
Generic libraries for Go, C/C++, and Rust
Probably the most exhaustive accumulation of eBPF resources is Quinten Monnet’s
blog, Whirl Offload.
If you’ve made it to this point, my hope is you’ve got at least a baseline understanding of what
eBPF is, why it’s important, and the basics of how it works. In this article, we have briefly
covered the following points:
eBPF is a revolutionary technology because it lets programmers execute custom bytecode
within the kernel without having to change the kernel or load kernel modules.
eBPF is event-driven, i.e. each eBPF program is an event handler. These events are called
“hooks”.
eBPF programs interact with user-space programs via eBPF maps that are key-value pairs.
You can see eBPF in action by playing with the audit log in Teleport, an open source
alternative to OpenSSH.
About the Author
11 of 12 5/4/21, 08:38
A Gentle Introduction to eBPF https://www.infoq.com/articles/gentle-linux-ebpf-introduc...
Last chance to save $50 for QCon Plus (May 17-28). Book your spot before the early bird ends on
May 1st.
and video content. In his free time, Virag enjoys rock climbing, video games, and
walking his dog.
Discuss
Please see https://www.infoq.com for the latest version of this information.
12 of 12 5/4/21, 08:38