BPF CO-RE
(Compile Once – Run Everywhere)
Andrii Nakryiko
Developing BPF application (today)
Development server
bpf.c DrivingApp.cpp
#include <linux/bpf.h>
#include <linux/filter.h> #include <bcc/BPF.h>
int prog(struct __sk_buff* skb)
{
if (skb->len < X) {
embed std::string BPF_PROGRAM =
#include ”path/to/bpf.c”
com
p ile
App package
return 1;
} namespace facebook {
}
...
}
. . .
DrivingApp libbcc
p lo y bpf.c
de LLVM/Clang
Data center
Developing BPF application (today)
Production server
bpf.c
System headers Kernel
Clang bpf.o
#include <linux/bpf.h> linux/bpf.h
App compile #include <linux/filter.h> linux/filter.h verify
int prog(...) { linux/shed.h
... linux/fs.h
} ...
Developing BPF application (today)
Problem:
“On the fly” compilation
“On the fly” BPF compilation
Why?
• Accessing kernel structs (e.g., task_struct or sk_buff)
• Memory layout changes between versions/configurations
• BPF code needs to be compiled w/ fixed offsets/sizes
“On the fly” BPF compilation
Problems
1. Every prod machine needs kernel headers
2. LLVM/Clang is big and heavy
3. Testing is a pain
“On the fly” BPF compilation
Problems
Every prod machine needs kernel headers
• kernel-devel package required
• kernel-devel is missing internal headers
• custom one-off kernels are a pain
• kernel-devel can get out of sync
“On the fly” BPF compilation
Problems
LLVM/Clang is big and heavy
• libbcc.so > 100MB
• compilation is a heavy-weight process
• can use lots of memory and CPU
• on busy machine can tip over prod workload
“On the fly” BPF compilation
Problems
Testing is a pain
• variety of kernel versions/configurations
• “works on my machine” means nothing
• Problem is detected only at run time
Can we compile once?
Then run same binary everywhere?
BPF CO-RE
(Compile Once – Run Everywhere)
Goals
• No kernel headers
• No “on the fly” compilation
• Upfront validation against prod kernels
BPF CO-RE flow
Compile Development server
Clang
bpf.c
#include <vmlinux.h> bpf.o w/ relocs
Kernel #include <bpf_core.h>
bpftool vmlinux.h compile
BTF int prog(struct __sk_buff* skb)
{
...
}
package
Data center
App package
deploy DrivingApp
libbpf bpf.o
BPF CO-RE flow
Test Development server
Kernel 4.16
BTF
Kernel 4.18
bpf.o w/ relocs
bpftool BTF
validate
Kernel 4.20
BTF
Kernel 5.0
BTF
BPF CO-RE flow
Run Production server
Kernel
libbpf bpf.o w/ relocs libbpf
Kernel
App relocate load/verify
BTF
BPF CO-RE
Overview
• Self-describing kernel (BTF)
• Clang w/ emitted relocations
• Libbpf as relocating loader
• Tooling for testing
BPF CO-RE
Self-describing kernel
• Deduplicated BTF information
• compact (no need to strip it out: 2MB vs 177MB of DWARF)
• describes all kernel types (size, layout, etc)
• always in sync w/ kernel
• lossless BTF to C conversion
• Available today:
• CONFIG_DEBUG_INFO_BTF=y (needs pahole >= v1.13)
BPF CO-RE Challenges
• Struct layout changes
• Version- / config-specific fields (logic in general)
• #define macros
• Unrelocatable sizeof()
Field offset relocation 0:
1:
(85)
(07)
call bpf_get_current_task
r0 += 1952
#include <linux/sched.h> 2: (bf) r1 = r10
#include <linux/types.h> 3: (07) r1 += -8
4: (b7) r2 = 8
5: (bf) r3 = r0
6: (85) call bpf_probe_read
int on_event(void* ctx) { 7: (b7) r0 = 0
struct task_struct *task; 8: (95) exit
u64 read_bytes;
task = (void *)bpf_get_current_task();
bpf_probe_read(
&read_bytes,
sizeof(u64),
&task->ioac.read_bytes);
Field reloc:
- insn: #1
return 0;
- type: struct task_struct
}
- accessor: 30:4
BPF CO-RE Challenges
• Struct layout changes
• Kernel version- / config-specific logic
• #define macros
• Unrelocatable sizeof()
Extern relocation 0:
1:
(85)
(b7)
call bpf_get_current_task
r1 = XXX
#include <linux/sched.h> 2: (15) if r1 == 0x0 goto pc+6
#include <linux/types.h> 3: (07) r0 += 1952
/* relies on /proc/config.gz */ 4: (bf) r1 = r10
extern bool CONFIG_IO_TASK_ACCOUNTING; 5: (07) r1 += -8
6: (b7) r2 = 8
int on_event(void* ctx) { 7: (bf) r3 = r0
struct task_struct *task; 8: (85) call bpf_probe_read
u64 read_bytes; 9: (b7) r0 = 0
task = (void *)bpf_get_current_task(); 10: (95) exit
if (CONFIG_IO_TASK_ACCOUNTING) {
Extern reloc:
return bpf_probe_read(
- insn: #1
&read_bytes,
- name: CONFIG_TASK_IO_ACCOUNTING
sizeof(u64),
- type: bool
&task->ioac.read_bytes);
} Field reloc:
return 0; - insn: #3
} - type: struct task_struct
- accessor: 30:4
Uncommon/experimental fields
struct task_struct___custom {
u64 experimental;
};
int on_event(void* ctx) {
struct task_struct *task, *exp_task;
u64 value = 0;
task = (void *)bpf_get_current_task();
exp_task = (struct task_struct___custom *)task;
bpf_probe_read(&value, sizeof(u64), &exp_task->experimental);
return 0;
}
BPF CO-RE Challenges
• Struct layout changes
• Kernel version- / config-specific logic
• #define macros
• Unrelocatable sizeof()
#define macros
• Constants, flags, etc…
• DWARF doesn’t record #defines, so doesn’t BTF
• Copy/paste whatever you need?
• bpf_core.h can provide commonly-needed stuff
BPF CO-RE Challenges
• Struct layout changes
• Kernel version- / config-specific logic
• #define macros
• Unrelocatable sizeof()
Unrelocatable sizeof()
struct task_struct *task;
struct task_io_accounting io_acc;
task = (void *)bpf_get_current_task();
bpf_probe_read(&io_add, sizeof(struct task_io_accounting), &task->ioac);
// accessing fields on the stack is faster than
// bpf_probe_read()’ing them individually
io_acc.io_read_bytes;
io_acc.io_write_bytes;
io_acc.rchar;
io_acc.wchar;
Not relocatable
Unrelocatable sizeof()
struct task_struct *task;
struct task_io_accounting io_acc;
task = (void *)bpf_get_current_task();
io_acc = __builtin_bpf_read_field(&task, ioac);
Abstracts bitfield access?..
Maybe relocatable?
Questions?