Tags: sevan/dwarves
Tags
DWARF loader: - Handle DWARF5 DW_OP_addrx properly Part of the effort to support the subset of DWARF5 that is generated when building the kernel. - Handle subprogram ret type with abstract_origin properly Adds a second pass to resolve abstract origin DWARF description of functions to aid the BTF encoder in getting the right return type. - Check .notes section for LTO build info When LTO is used, currently only with clang, we need to do extra steps to handle references from one object (compile unit, aka CU) to another, a way for DWARF to avoid duplicating information. - Check .debug_abbrev for cross-CU references When the kernel build process doesn't add an ELF note in vmlinux indicating that LTO was used and thus intra-CU references are present and thus we need to use a more expensive way to resolve types and (again) thus to encode BTF, we need to look at DWARF's .debug_abbrev ELF section to figure out if such intra-CU references are present. - Permit merging all DWARF CU's for clang LTO built binary Allow not trowing away previously supposedly self contained compile units (objects, aka CU, aka Compile Units) as they have type descriptions that will be used in later CUs. - Permit a flexible HASHTAGS__BITS So that we can use a more expensive algorithm when we need to keep previously processed compile units that will then be referenced by later ones to resolve types. - Use a better hashing function, from libbpf Enabling patch to combine compile units when using LTO. BTF encoder: - Add --btf_gen_all flag A new command line to allow asking for the generation of all BTF encodings, so that we can stop adding new command line options to enable new encodings in the kernel Makefile. - Match ftrace addresses within ELF functions To cope with differences in how DWARF and ftrace describes function boundaries. - Funnel ELF error reporting through a macro To use libelf's elf_error() function, improving error messages. - Sanitize non-regular int base type Cope with clang with dwarf5 non-regular int base types, tricky stuff, see yhs full explanation in the relevant cset. - Add support for the floating-point types S/390 has floats'n'doubles in its arch specific linux headers, cope with that. Pretty printer: - Honour conf_fprintf.hex when printing enumerations If the user specifies --hex in the command line, honour it when printing enumerations. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
v1.20: BTF encoder: - Improve ELF error reporting using elf_errmsg(elf_errno()). - Improve objcopy error handling. - Fix handling of 'restrict' qualifier, that was being treated as a 'const'. - Support SHN_XINDEX in st_shndx symbol indexes, to handle ELF objects with more than 65534 sections, for instance, which happens with kernels built with 'KCFLAGS="-ffunction-sections -fdata-sections", Other cases may include when using FG-ASLR, LTO. - Cope with functions without a name, as seen sometimes when building kernel images with some versions of clang, when a SEGFAULT was taking place. - Fix BTF variable generation for kernel modules, not skipping variables at offset zero. - Fix address size to match what is in the ELF file being processed, to fix using a 64-bit pahole binary to generate BTF for a 32-bit vmlinux image. - Use kernel module ftrace addresses when finding which functions to encode, which increases the number of functions encoded. libbpf: - Allow use of packaged version, for distros wanting to dynamically link with the system's libbpf package instead of using the libbpf git submodule shipped in pahole's source code. DWARF loader: - Support DW_AT_data_bit_offset This appeared in DWARF4 but is supported only in gcc's -gdwarf-5, support it in a way that makes the output be the same for both cases. $ gcc -gdwarf-5 -c examples/dwarf5/bf.c $ pahole bf.o struct pea { long int a:1; /* 0: 0 8 */ long int b:1; /* 0: 1 8 */ long int c:1; /* 0: 2 8 */ /* XXX 29 bits hole, try to pack */ /* Bitfield combined with next fields */ int after_bitfield; /* 4 4 */ /* size: 8, cachelines: 1, members: 4 */ /* sum members: 4 */ /* sum bitfield members: 3 bits, bit holes: 1, sum bit holes: 29 bits */ /* last cacheline: 8 bytes */ }; - DW_FORM_implicit_const in attr_numeric() and attr_offset() - Support DW_TAG_GNU_call_site, its the standardized rename of the previously supported DW_TAG_GNU_call_site. build: - Fix compilation on 32-bit architectures. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
v1.19: - Support split BTF, where a main BTF file, vmlinux, can be used to find types and then a kernel module, for instance, can have just what is unique to it. For instance, looking for a type in the main vmlinux BTF info: $ pahole wmi_notify_handler pahole: type 'wmi_notify_handler' not found $ If we look at the 'wmi' module BTF info that is in: $ ls -la /sys/kernel/btf/wmi -r--r--r--. 1 root root 2866 Nov 18 13:35 /sys/kernel/btf/wmi $ $ pahole /sys/kernel/btf/wmi -C wmi_notify_handler typedef void (*wmi_notify_handler)(u32, void *); $ '--btf_base=/sys/kernel/btf/vmlinux' was automatically added in this last example, an option that was also introduced in this version where types used in the wmi.ko module but present in vmlinux can be found so that there is no duplicity of types. - Update libbpf to get the split BTF support and use some of its functions to load BTF and speed up DWARF loading and BTF encoding. - Support cross-compiled ELF binaries with different endianness - Support showing typedefs for anonymous types, like structs, unions and enums, see the "Align enumerators" entry below for an example, another: $ pahole rwlock_t typedef struct { arch_rwlock_t raw_lock; /* 0 8 */ /* size: 8, cachelines: 1, members: 1 */ /* last cacheline: 8 bytes */ } rwlock_t; $ - Align enumerators: $ pahole ZSTD_strategy typedef enum { ZSTD_fast = 0, ZSTD_dfast = 1, ZSTD_greedy = 2, ZSTD_lazy = 3, ZSTD_lazy2 = 4, ZSTD_btlazy2 = 5, ZSTD_btopt = 6, ZSTD_btopt2 = 7, } ZSTD_strategy; $ - Workaround bugs in the generation of DWARF records for functions in some gcc versions that were causing breakage in the encoding of BTF: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97060 "Missing DW_AT_declaration=1 in dwarf data" - Ignore zero-sized ELF symbols instead of erroring out. - Handle union forward declaration properly in the BTF loader. - Introduce --numeric_version for use in scripts and Makefiles: $ pahole --version v1.19 $ pahole --numeric_version 119 $ To avoid things like this in the kernel's scripts/link-vmlinux.sh: pahole_ver=$(${PAHOLE} --version | sed -E 's/v([0-9]+)\.([0-9]+)/\1\2/') - Try sole pfunct argument as a function name, just like pahole with type names: $ pfunct tcp_v4_rcv int tcp_v4_rcv(struct sk_buff * skb); $ - Speed up pfunct using some of the load techniques used in pahole. - Discard CUs after BTF encoding as they're not used anymore, greatly reducing memory usage and speeding up vmlinux BTF encoding. - Revamp how per-CPU variables are encoded in BTF. - Include BTF info for static functions. - Use BTF's string APIs for strings management, greatly improving performance over the tsearch(). - Increase size of DWARF lookup hash table, shaving off about 1 second out of about 20 seconds total for Linux BTF dedup. - Stop BTF encoding when errors are found in some DWARF CU. - Implement --packed, to show just packed structures, for instance, here are the top 5 packed data structures in the Linux kernel: $ pahole --sizes --packed | sort -k2 -nr | head -5 e820_table 64004 0 boot_params 4096 0 efi_variable 2084 0 snd_soc_tplg_pcm 912 0 ntb_info_regs 800 0 $ And here is one of them: $ pahole efi_variable struct efi_variable { efi_char16_t VariableName[512]; /* 0 1024 */ /* --- cacheline 16 boundary (1024 bytes) --- */ efi_guid_t VendorGuid; /* 1024 16 */ long unsigned int DataSize; /* 1040 8 */ __u8 Data[1024]; /* 1048 1024 */ /* --- cacheline 32 boundary (2048 bytes) was 24 bytes ago --- */ efi_status_t Status; /* 2072 8 */ __u32 Attributes; /* 2080 4 */ /* size: 2084, cachelines: 33, members: 6 */ /* last cacheline: 36 bytes */ } __attribute__((__packed__)); $ - Fix bug in distros such as OpenSUSE:15.2 where DW_AT_alignment isn't defined. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
v1.18: - Use type information to pretty print raw data from stdin, all documented in the man pages, further information in the csets. TLDRish: this almost completely deciphers a perf.data file: $ pahole ~/bin/perf --header=perf_file_header \ -C 'perf_file_attr(range=attrs),perf_event_header(range=data,sizeof,type,type_enum=perf_event_type+perf_user_event_type)' < perf.data What the above command line does: This will state that a 'struct perf_file_header' is available in BTF or DWARF in the ~/bin/perf file and that at the start of stdin it should be used to decode sizeof(struct perf_file_header) bytes, pretty printing it according to its members. Furthermore, that header can be referenced later in the command line, for instance that 'range=data' means that in the header, it expects a 'range' member in 'struct perf_file_header' to be used: $ pahole ~/bin/perf --header=perf_file_header < perf.data { .magic = 3622385352885552464, .size = 104, .attr_size = 136, .attrs = { .offset = 296, .size = 136, }, .data = { .offset = 432, .size = 14688, }, .event_types = { .offset = 0, .size = 0, }, .adds_features = { 376537084, 0, 0, 0 }, }, $ That 'range' field is expected to have 'offset' and 'size' fields, so that it can go on decoding a number of 'struct perf_event_header' entries. That 'sizeof' in turn expects that in 'struct perf_event_header' there is a 'size' field stating how long that particular record is, one can also use 'sizeof=some_other_member_name'. This supports variable sized records and then the 'type' field expects there is a 'struct perf_event_type' member named 'type' (again, type=something_else may be used. Finally, the value in the 'type' field is used to lookup an entry in the set formed by the two enumerations specified in the 'type_enum=' argument. If we look at these enums we'll see that its entries have names that can be, when lowercased, point to structs containing the layout for the variable sized record, which allows it to cast and produce the right pretty printed output. I.e. using the kernel BTF info we get: $ pahole perf_event_type enum perf_event_type { PERF_RECORD_MMAP = 1, PERF_RECORD_LOST = 2, PERF_RECORD_COMM = 3, PERF_RECORD_EXIT = 4, PERF_RECORD_THROTTLE = 5, PERF_RECORD_UNTHROTTLE = 6, PERF_RECORD_FORK = 7, PERF_RECORD_READ = 8, PERF_RECORD_SAMPLE = 9, PERF_RECORD_MMAP2 = 10, PERF_RECORD_AUX = 11, PERF_RECORD_ITRACE_START = 12, PERF_RECORD_LOST_SAMPLES = 13, PERF_RECORD_SWITCH = 14, PERF_RECORD_SWITCH_CPU_WIDE = 15, PERF_RECORD_NAMESPACES = 16, PERF_RECORD_KSYMBOL = 17, PERF_RECORD_BPF_EVENT = 18, PERF_RECORD_CGROUP = 19, PERF_RECORD_TEXT_POKE = 20, PERF_RECORD_MAX = 21, }; $ That is the same as in ~/bin/perf, and, if we get one of these and ask for that struct: $ pahole -C perf_record_mmap ~/bin/perf struct perf_record_mmap { struct perf_event_header header; /* 0 8 */ __u32 pid; /* 8 4 */ __u32 tid; /* 12 4 */ __u64 start; /* 16 8 */ __u64 len; /* 24 8 */ __u64 pgoff; /* 32 8 */ char filename[4096]; /* 40 4096 */ /* size: 4136, cachelines: 65, members: 7 */ /* last cacheline: 40 bytes */ }; $ Many other options were introduced to work with this, including --count, --skip, etc, look at the man page for details. - Store percpu variables in vmlinux BTF. This can be disabled when debugging kernel features being developed to use it. - pahole now should be segfault free when handling gdb test suit DWARF files, including ADA, FORTRAN, rust and dwp compressed files, the later being just flatly refused, that got left for v1.19. - Bail out on partial units for now, avoiding segfaults and providing warning to user, hopefully will be addressed in v1.19. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
v1.17 changes: tl;dr: BTF loader: - Support raw BTF as available in /sys/kernel/btf/vmlinux. pahole: - When the sole argument passed isn't a file, take it as a class name: $ pahole sk_buff - Do not require a class name to operate without a file name. $ pahole # is equivalent to: $ pahole vmlinux - Make --find_pointers_to consider unions: $ pahole --find_pointers_to ehci_qh - Make --contains and --find_pointers_to honour --unions $ pahole --unions --contains inet_sock - Add support for finding pointers to void: $ pahole --find_pointers_to void - Make --contains and --find_pointers_to to work with base types: $ pahole --find_pointers_to 'short unsigned int' - Make --contains look for more than just unions, structs: $ pahole --contains raw_spinlock_t - Consider unions when looking for classes containing some class: $ pahole --contains tpacket_req - Introduce --unions to consider just unions: $ pahole --unions --sizes $ pahole --unions --prefix tcp $ pahole --unions --nr_members - Fix -m/--nr_methods - Number of functions operating on a type pointer $ pahole --nr_methods man-pages: - Add section about --hex + -E to locate offsets deep into sub structs. - Add more information about BTF. - Add some examples. ---------------------------------- I want the details: btf loader: - Support raw BTF as available in /sys/kernel/btf/vmlinux Be it automatically when no -F option is passed and /sys/kernel/btf/vmlinux is available, or when /sys/kernel/btf/vmlinux is passed as the filename to the tool, i.e.: $ pahole -C list_head struct list_head { struct list_head * next; /* 0 8 */ struct list_head * prev; /* 8 8 */ /* size: 16, cachelines: 1, members: 2 */ /* last cacheline: 16 bytes */ }; $ strace -e openat pahole -C list_head |& grep /sys/kernel/btf/ openat(AT_FDCWD, "/sys/kernel/btf/vmlinux", O_RDONLY) = 3 $ $ pahole -C list_head /sys/kernel/btf/vmlinux struct list_head { struct list_head * next; /* 0 8 */ struct list_head * prev; /* 8 8 */ /* size: 16, cachelines: 1, members: 2 */ /* last cacheline: 16 bytes */ }; $ If one wants to grab the matching vmlinux to use its DWARF info instead, which is useful to compare the results with what we have from BTF, for instance, its just a matter of using '-F dwarf'. This in turn shows something that at first came as a surprise, but then has a simple explanation: For very common data structures, that will probably appear in all of the DWARF CUs (Compilation Units), like 'struct list_head', using '-F dwarf' is faster: $ perf stat -e cycles pahole -F btf -C list_head > /dev/null Performance counter stats for 'pahole -F btf -C list_head': 45,722,518 cycles:u 0.023717300 seconds time elapsed 0.016474000 seconds user 0.007212000 seconds sys $ perf stat -e cycles pahole -F dwarf -C list_head > /dev/null Performance counter stats for 'pahole -F dwarf -C list_head': 14,170,321 cycles:u 0.006668904 seconds time elapsed 0.005562000 seconds user 0.001109000 seconds sys $ But for something that is more specific to a subsystem, the DWARF loader will have to process way more stuff till it gets to that struct: $ perf stat -e cycles pahole -F dwarf -C tcp_sock > /dev/null Performance counter stats for 'pahole -F dwarf -C tcp_sock': 31,579,795,238 cycles:u 8.332272930 seconds time elapsed 8.032124000 seconds user 0.286537000 seconds sys $ While using the BTF loader the time should be constant, as it loads everything from /sys/kernel/btf/vmlinux: $ perf stat -e cycles pahole -F btf -C tcp_sock > /dev/null Performance counter stats for 'pahole -F btf -C tcp_sock': 48,823,488 cycles:u 0.024102760 seconds time elapsed 0.012035000 seconds user 0.012046000 seconds sys $ Above I used '-F btf' just to show that it can be used, but its not really needed, i.e. those are equivalent: $ strace -e openat pahole -F btf -C list_head |& grep /sys/kernel/btf/vmlinux openat(AT_FDCWD, "/sys/kernel/btf/vmlinux", O_RDONLY) = 3 $ strace -e openat pahole -C list_head |& grep /sys/kernel/btf/vmlinux openat(AT_FDCWD, "/sys/kernel/btf/vmlinux", O_RDONLY) = 3 $ The btf_raw__load() function that ends up being grafted into the preexisting btf_elf routines was based on libbpf's btf_load_raw(). pahole: - When the sole argument passed isn't a file, take it as a class name. With that it becomes as compact as it gets for kernel data structures, just state the name of the struct and it'll try to find that as a file, not being a file it'll use /sys/kernel/btf/vmlinux and the argument as a list of structs, i.e.: $ pahole skb_ext,list_head struct list_head { struct list_head * next; /* 0 8 */ struct list_head * prev; /* 8 8 */ /* size: 16, cachelines: 1, members: 2 */ /* last cacheline: 16 bytes */ }; struct skb_ext { refcount_t refcnt; /* 0 4 */ u8 offset[3]; /* 4 3 */ u8 chunks; /* 7 1 */ char data[]; /* 8 0 */ /* size: 8, cachelines: 1, members: 4 */ /* last cacheline: 8 bytes */ }; $ pahole hlist_node struct hlist_node { struct hlist_node * next; /* 0 8 */ struct hlist_node * * pprev; /* 8 8 */ /* size: 16, cachelines: 1, members: 2 */ /* last cacheline: 16 bytes */ }; $ Of course -C continues to work: $ pahole -C inode | tail __u32 i_fsnotify_mask; /* 556 4 */ struct fsnotify_mark_connector * i_fsnotify_marks; /* 560 8 */ struct fscrypt_info * i_crypt_info; /* 568 8 */ /* --- cacheline 9 boundary (576 bytes) --- */ struct fsverity_info * i_verity_info; /* 576 8 */ void * i_private; /* 584 8 */ /* size: 592, cachelines: 10, members: 53 */ /* last cacheline: 16 bytes */ }; $ - Add support for finding pointers to void, e.g.: $ pahole --find_pointers_to void --prefix tcp tcp_md5sig_pool: scratch $ pahole tcp_md5sig_pool struct tcp_md5sig_pool { struct ahash_request * md5_req; /* 0 8 */ void * scratch; /* 8 8 */ /* size: 16, cachelines: 1, members: 2 */ /* last cacheline: 16 bytes */ }; $ - Make --contains and --find_pointers_to to work with base types I.e. with plain 'int', 'long', 'short int', etc: $ pahole --find_pointers_to 'short unsigned int' uv_hub_info_s: socket_to_node uv_hub_info_s: socket_to_pnode uv_hub_info_s: pnode_to_socket vc_data: vc_screenbuf vc_data: vc_translate filter_pred: ops ext4_sb_info: s_mb_offsets $ pahole ext4_sb_info | 'sort unsigned int' bash: sort unsigned int: command not found... ^[^C $ $ pahole ext4_sb_info | grep 'sort unsigned int' $ pahole ext4_sb_info | grep 'short unsigned int' short unsigned int s_mount_state; /* 160 2 */ short unsigned int s_pad; /* 162 2 */ short unsigned int * s_mb_offsets; /* 664 8 */ $ pahole --contains 'short unsigned int' apm_info desc_ptr thread_struct mpc_table mpc_intsrc fsnotify_mark_connector <SNIP> sock_fprog blk_mq_hw_ctx skb_shared_info $ - Make --contains look for more than just unions, structs, look for typedefs, enums and types that descend from 'struct type': So now we can do more interesting queries, lets see, what are the data structures that embed a raw spinlock in the linux kernel? $ pahole --contains raw_spinlock_t task_struct rw_semaphore hrtimer_cpu_base prev_cputime percpu_counter ratelimit_state perf_event_context task_delay_info <SNIP> lpm_trie bpf_queue_stack $ Look at the csets comments to see more examples. - Make --contains and --find_pointers_to honour --unions I.e. when looking for unions or structs that contains/embeds or looking for unions/structs that have pointers to a given type. E.g: $ pahole --contains inet_sock sctp_sock inet_connection_sock raw_sock udp_sock raw6_sock $ pahole --unions --contains inet_sock $ We have structs embedding 'struct inet_sock', but no unions doing that. - Make --find_pointers_to consider unions I.e.: $ pahole --find_pointers_to ehci_qh ehci_hcd: qh_scan_next ehci_hcd: async ehci_hcd: dummy $ Wasn't considering: $ pahole -C ehci_shadow union ehci_shadow { struct ehci_qh * qh; /* 0 8 */ struct ehci_itd * itd; /* 0 8 */ struct ehci_sitd * sitd; /* 0 8 */ struct ehci_fstn * fstn; /* 0 8 */ __le32 * hw_next; /* 0 8 */ void * ptr; /* 0 8 */ }; $ Fix it: $ pahole --find_pointers_to ehci_qh ehci_hcd: qh_scan_next ehci_hcd: async ehci_hcd: dummy ehci_shadow: qh $ - Consider unions when looking for classes containing some class: I.e.: $ pahole --contains tpacket_req tpacket_req_u $ Wasn't working, but should be considered with --contains/-i: $ pahole -C tpacket_req_u union tpacket_req_u { struct tpacket_req req; /* 0 16 */ struct tpacket_req3 req3; /* 0 28 */ }; $ - Introduce --unions to consider just unions Most filters can be used together with it, for instance to see the biggest unions in the kernel: $ pahole --unions --sizes | sort -k2 -nr | head thread_union 16384 0 swap_header 4096 0 fpregs_state 4096 0 autofs_v5_packet_union 304 0 autofs_packet_union 272 0 pptp_ctrl_union 208 0 acpi_parse_object 200 0 acpi_descriptor 200 0 bpf_attr 120 0 phy_configure_opts 112 0 $ Or just some unions that have some specific prefix: $ pahole --unions --prefix tcp union tcp_md5_addr { struct in_addr a4; /* 0 4 */ struct in6_addr a6; /* 0 16 */ }; union tcp_word_hdr { struct tcphdr hdr; /* 0 20 */ __be32 words[5]; /* 0 20 */ }; union tcp_cc_info { struct tcpvegas_info vegas; /* 0 16 */ struct tcp_dctcp_info dctcp; /* 0 16 */ struct tcp_bbr_info bbr; /* 0 20 */ }; $ What are the biggest unions in terms of number of members? $ pahole --unions --nr_members | sort -k2 -nr | head security_list_options 218 aml_resource 36 acpi_resource_data 29 acpi_operand_object 26 iwreq_data 18 sctp_params 15 ib_flow_spec 14 ethtool_flow_union 14 pptp_ctrl_union 13 bpf_attr 12 $ If you want to script most of the queries can change the separator: $ pahole --unions --nr_members -t, | sort -t, -k2 -nr | head security_list_options,218 aml_resource,36 acpi_resource_data,29 acpi_operand_object,26 iwreq_data,18 sctp_params,15 ib_flow_spec,14 ethtool_flow_union,14 pptp_ctrl_union,13 bpf_attr,12 $ - Fix -m/--nr_methods - Number of functions operating on a type pointer We had to use the same hack as in pfunct, as implemented in ccf3eeb ("btf_loader: Add support for BTF_KIND_FUNC"), will hide that 'struct ftype' (aka function prototype) indirection behind the parameter iterator (function__for_each_parameter). For now, here is the top 10 Linux kernel data structures in terms of number of functions receiving as one of its parameters a pointer to it, using /sys/kernel/btf/vmlinux to look at all the vmlinux types and functions (the ones visible in kallsyms, but with the parameters and its types): $ pahole -m | sort -k2 -nr | head device 955 sock 568 sk_buff 541 task_struct 437 inode 421 pci_dev 390 page 351 net_device 347 file 315 net 312 $ $ pahole --help |& grep -- -m -m, --nr_methods show number of methods $ - Do not require a class name to operate without a file name Since we default to operating on the running kernel data structures, we should make the default to, with no options passed, to pretty print all the running kernel data structures, or do what was asked in terms of number of members, size of structs, etc, i.e.: # pahole --help |& head Usage: pahole [OPTION...] FILE -a, --anon_include include anonymous classes -A, --nested_anon_include include nested (inside other structs) anonymous classes -B, --bit_holes=NR_HOLES Show only structs at least NR_HOLES bit holes -c, --cacheline_size=SIZE set cacheline size to SIZE --classes_as_structs Use 'struct' when printing classes -C, --class_name=CLASS_NAME Show just this class -d, --recursive recursive mode, affects several other flags # Continues working as before, but if you do: pahole It will work just as if you did: pahole vmlinux and that vmlinux file is the running kernel vmlinux. And since the default now is to read BTF info, then it will do all its operations on /sys/kernel/btf/vmlinux, when present, i.e. want to know what are the fattest data structures in the running kernel: # pahole -s | sort -k2 -nr | head cmp_data 290904 1 dec_data 274520 1 cpu_entry_area 217088 0 pglist_data 172928 4 saved_cmdlines_buffer 131104 1 debug_store_buffers 131072 0 hid_parser 110848 1 hid_local 110608 0 zonelist 81936 0 e820_table 64004 0 # How many data structures in the running kernel vmlinux area embbed 'struct list_head'? # pahole -i list_head | wc -l 260 # Lets see some of those? # pahole -C fsnotify_event struct fsnotify_event { struct list_head list; /* 0 16 */ struct inode * inode; /* 16 8 */ /* size: 24, cachelines: 1, members: 2 */ /* last cacheline: 24 bytes */ }; # pahole -C audit_chunk struct audit_chunk { struct list_head hash; /* 0 16 */ long unsigned int key; /* 16 8 */ struct fsnotify_mark * mark; /* 24 8 */ struct list_head trees; /* 32 16 */ int count; /* 48 4 */ /* XXX 4 bytes hole, try to pack */ atomic_long_t refs; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ struct callback_head head; /* 64 16 */ struct node owners[]; /* 80 0 */ /* size: 80, cachelines: 2, members: 8 */ /* sum members: 76, holes: 1, sum holes: 4 */ /* last cacheline: 16 bytes */ }; #
v1.16 changes: BTF encoder: Andrii Nakryiko <andriin@fb.com>: - Preserve and encode exported functions as BTF_KIND_FUNC. Add encoding of DWARF's DW_TAG_subprogram_type into BTF's BTF_KIND_FUNC (plus corresponding BTF_KIND_FUNC_PROTO). Only exported functions are converted for now. This allows to capture all the exported kernel functions, same subset that's exposed through /proc/kallsyms. BTF loader: Arnaldo Carvalho de Melo <acme@redhat.com> - Add support for BTF_KIND_FUNC Some changes to the fprintf routines were needed, as BTF has as the function type just a BTF_KIND_FUNC_PROTO, while DWARF has as the type for a function its return value type. With a function->btf flag this was overcome and all the other goodies in pfunct are present. Pretty printer: Arnaldo Carvalho de Melo: - Account inline type __aligned__ member types for spacing: union { refcount_t rcu_users; /* 2568 4 */ struct callback_head rcu __attribute__((__aligned__(8))); /* 2568 16 */ - } __attribute__((__aligned__(8))); /* 2568 16 */ + } __attribute__((__aligned__(8))); /* 2568 16 */ struct pipe_inode_info * splice_pipe; /* 2584 8 */ - Fix alignment of class members that are structs/enums/unions E.g. look at that 'completion' member in this struct: struct cpu_stop_done { atomic_t nr_todo; /* 0 4 */ int ret; /* 4 4 */ - struct completion completion; /* 8 32 */ + struct completion completion; /* 8 32 */ /* size: 40, cachelines: 1, members: 3 */ /* last cacheline: 40 bytes */ - Fixup handling classes with no members, solving a NULL deref. Gareth Lloyd <gareth.lloyd@uk.ibm.com>: - Avoid infinite loop trying to determine type with static data member of its own type. RPM spec file. Jiri Olsa <jolsa@redhat.com> Add dwarves dependency on libdwarves1. pfunct: Arnaldo Carvalho de Melo <acme@redhat.com> - type->type == 0 is void, fix --compile for that We were using the fall back for that, i.e. 'return 0;' was being emitted for a function returning void, noticed with using BTF as the format. pdwtags: - Print DW_TAG_subroutine_type as well So that we can see at least via pdwtags those tags, be it from DWARF of BTF. core: Arnaldo Carvalho de Melo <acme@redhat.com> Fix ptr_table__add_with_id() handling of pt->nr_entries, covering how BTF variables IDs are encoded. pglobal: Arnaldo Carvalho de Melo <acme@redhat.com>: - Allow passing the format path specifier, to use with BTF I.e. now we can, just like with pahole, use: pglobal -F btf --variable foo.o To get the global variables. Tree wide: Arnaldo Carvalho de Melo <acme@redhat.com>: - Fixup issues pointed out by various coverity reports. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Here is a summary of changes for the 1.13 version of pahole and its f… …riends: - BTF - Use of the recently introduced BTF deduplication algorithm present in the Linux kernel's libbpf library, which allows for all the types in a multi compile unit binary such as vmlinux to be compactly stored, without duplicates. E.g.: from roughly: $ readelf -SW ../build/v5.1-rc4+/vmlinux | grep .debug_info.*PROGBITS [63] .debug_info PROGBITS 0000000000000000 1d80be0 c3c18b9 00 0 0 1 $ 195 MiB to: $ time pahole --btf_encode ../build/v5.1-rc4+/vmlinux real 0m19.168s user 0m17.707s # On a Lenovo t480s (i7-8650U) SSD sys 0m1.337s $ $ readelf -SW ../build/v5.1-rc4+/vmlinux | grep .BTF.*PROGBITS [78] .BTF PROGBITS 0000000000000000 27b49f61 1e23c3 00 0 0 1 $ ~2 MiB - Introduce a 'btfdiff' utility that prints the output from DWARF and from BTF, comparing the pretty printed outputs, running it on various linux kernel images, such as an allyesconfig for ppc64. Running it on the above 5.1-rc4+ vmlinux: $ btfdiff ../build/v5.1-rc4+/vmlinux $ No differences from the types generated from the DWARF ELF sections to the ones generated from the BTF ELF section. - Add a BTF loader, i.e. 'pahole -F btf' allows pretty printing of structs and unions in the same fashion as with DWARF info, and since BTF is way more compact, using it is much faster than using DWARF. $ cat ../build/v5.1-rc4+/vmlinux > /dev/null $ perf stat -e cycles pahole -F btf ../build/v5.1-rc4+/vmlinux > /dev/null Performance counter stats for 'pahole -F btf ../build/v5.1-rc4+/vmlinux': 229,712,692 cycles:u 0.063379597 seconds time elapsed 0.056265000 seconds user 0.006911000 seconds sys $ perf stat -e cycles pahole -F dwarf ../build/v5.1-rc4+/vmlinux > /dev/null Performance counter stats for 'pahole -F dwarf ../build/v5.1-rc4+/vmlinux': 49,579,679,466 cycles:u 13.063487352 seconds time elapsed 12.612512000 seconds user 0.426226000 seconds sys $ - Better union support: - Allow unions to be specified in pahole in the same fashion as structs $ pahole -C thread_union ../build/v5.1-rc4+/net/ipv4/tcp.o union thread_union { struct task_struct task __attribute__((__aligned__(64))); /* 0 11008 */ long unsigned int stack[2048]; /* 0 16384 */ }; $ - Infer __attribute__((__packed__)) when structs have no alignment holes and violate basic types (integer, longs, short integer) natural alignment requirements. Several heuristics are used to infer the __packed__ attribute, see the changeset log for descriptions. $ pahole -F btf -C boot_e820_entry ../build/v5.1-rc4+/vmlinux struct boot_e820_entry { __u64 addr; /* 0 8 */ __u64 size; /* 8 8 */ __u32 type; /* 16 4 */ /* size: 20, cachelines: 1, members: 3 */ /* last cacheline: 20 bytes */ } __attribute__((__packed__)); $ $ pahole -F btf -C lzma_header ../build/v5.1-rc4+/vmlinux struct lzma_header { uint8_t pos; /* 0 1 */ uint32_t dict_size; /* 1 4 */ uint64_t dst_size; /* 5 8 */ /* size: 13, cachelines: 1, members: 3 */ /* last cacheline: 13 bytes */ } __attribute__((__packed__)); - Support DWARF5's DW_AT_alignment, which, together with the __packed__ attribute inference algorithms produce output that, when compiled, should produce structures with layouts that match the original source code. See it in action with 'struct task_struct', which will also show some of the new information at the struct summary, at the end of the struct: $ pahole -C task_struct ../build/v5.1-rc4+/vmlinux | tail -19 /* --- cacheline 103 boundary (6592 bytes) --- */ struct vm_struct * stack_vm_area; /* 6592 8 */ refcount_t stack_refcount; /* 6600 4 */ /* XXX 4 bytes hole, try to pack */ void * security; /* 6608 8 */ /* XXX 40 bytes hole, try to pack */ /* --- cacheline 104 boundary (6656 bytes) --- */ struct thread_struct thread __attribute__((__aligned__(64))); /* 6656 4352 */ /* size: 11008, cachelines: 172, members: 207 */ /* sum members: 10902, holes: 16, sum holes: 98 */ /* sum bitfield members: 10 bits, bit holes: 2, sum bit holes: 54 bits */ /* paddings: 3, sum paddings: 14 */ /* forced alignments: 6, forced holes: 1, sum forced holes: 40 */ } __attribute__((__aligned__(64))); $ - Add a '--compile' option to 'pfunct' that produces compileable output for the function prototypes in an object file. There are still some bugs but the vast majority of the kernel single compilation unit files the ones produced from a single .c file are working, see the new 'fullcircle' utility that uses this feature. Example of it in action: $ pfunct --compile=static_key_false ../build/v5.1-rc4+/net/ipv4/tcp.o typedef _Bool bool; typedef struct { int counter; /* 0 4 */ /* size: 4, cachelines: 1, members: 1 */ /* last cacheline: 4 bytes */ } atomic_t; struct jump_entry; struct static_key_mod; struct static_key { atomic_t enabled; /* 0 4 */ /* XXX 4 bytes hole, try to pack */ union { long unsigned int type; /* 8 8 */ struct jump_entry * entries; /* 8 8 */ struct static_key_mod * next; /* 8 8 */ }; /* 8 8 */ /* size: 16, cachelines: 1, members: 2 */ /* sum members: 12, holes: 1, sum holes: 4 */ /* last cacheline: 16 bytes */ }; bool static_key_false(struct static_key * key) { return *(bool *)1; } $ The generation of compilable code from the type information and its use in the new tool 'fullcircle, helps validate all the parts of this codebase, finding bugs that were lurking forever, go read the csets to find all sorts of curious C language features that are rarely seen, like unnamed zero sized bitfields and the way people have been using it over the years in a codebase like the linux kernel. Certainly there are several other features, changes and fixes that I forgot to mention! Now lemme release this version so that we can use it more extensively together with a recent patch merged for 5.2: [PATCH bpf-next] kbuild: add ability to generate BTF type info for vmlinux With it BTF will be always available for all the types of the kernel, which will open a pandora box of cool new features that are in the works, and, for people already using pahole, will greatly speed up its usage. Please try to alias it to use btf, i.e. alias pahole='pahole -F btf' Please report any problems you may find with this new version or with the BTF loader or any errors in the layout generated/pretty printed. Thanks to the fine BTF guys at Facebook for the patches and help in testing, fixing bugs and getting this out of the door, the stats for this release are: Changesets: 157 113 Arnaldo Carvalho de Melo Red Hat 32 Andrii Nakryiko Facebook 10 Yonghong Song Facebook 1 Martin Lau Facebook 1 Domenico Andreoli Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
v1.12 August 2018 - Add a BTF encoder (Martin KaFai Lau) BTF (BPF Type Format) is the meta data format which describes the data types of BPF program/map. Hence, it basically focus on the C programming language which the modern BPF is primary using. The first use case is to provide a generic pretty print capability for a BPF map. BTF has its root from CTF (Compact C-Type format). - Add Documentation on how to use the BTF encoder: (Arnaldo Carvalho de Melo) Using the Linux 'perf' tools integration with BPF/llvm/clang to show how to generate an object file that then gets its DWARF info used to create a .BTF ELF section with this new BTF format. That augmented eBPF ELF object file is then loaded while 'perf ftrace -g *bpf*' is used to show the kernel BTF validation process. - Initial support for DW_TAG_partial_unit (Arnaldo Carvalho de Melo) Just by treating these sections as DW_TAG_compile_unit, which is enough for the structs that don't contain cross-section type references to be correctly loaded and pretty-printed with pahole. This doesn't affect the kernel or modules, where such DWARF compression techniques are not used so far. (Arnaldo Carvalho de Melo) - Print cacheline boundaries in multiple union members, (Arnaldo Carvalho de Melo) We were showing it just on the first inner union member members, as if it was a struct, now we restart the cacheline boundaries when moving to print the next inner struct. As an example, look at 'struct audit_context' where the only cacheline boundary printed for the following unnamed union was the first one, for the 'socketcall' struct member, now that cacheline boundary appears in each of the union member inner structs: struct audit_context { <SNIP> union { struct { int nargs; /* 824 4 */ /* XXX 4 bytes hole, try to pack */ /* --- cacheline 13 boundary (832 bytes) --- */ long int args[6]; /* 832 48 */ } socketcall; /* 824 56 */ struct { kuid_t uid; /* 824 4 */ kgid_t gid; /* 828 4 */ /* --- cacheline 13 boundary (832 bytes) --- */ umode_t mode; /* 832 2 */ /* XXX 2 bytes hole, try to pack */ u32 osid; /* 836 4 */ int has_perm; /* 840 4 */ uid_t perm_uid; /* 844 4 */ gid_t perm_gid; /* 848 4 */ umode_t perm_mode; /* 852 2 */ /* XXX 2 bytes hole, try to pack */ long unsigned int qbytes; /* 856 8 */ } ipc; /* 824 40 */ struct { mqd_t mqdes; /* 824 4 */ /* XXX 4 bytes hole, try to pack */ /* --- cacheline 13 boundary (832 bytes) --- */ struct mq_attr mqstat; /* 832 64 */ } mq_getsetattr; /* 824 72 */ struct { mqd_t mqdes; /* 824 4 */ int sigev_signo; /* 828 4 */ } mq_notify; /* 824 8 */ struct { mqd_t mqdes; /* 824 4 */ /* XXX 4 bytes hole, try to pack */ /* --- cacheline 13 boundary (832 bytes) --- */ size_t msg_len; /* 832 8 */ unsigned int msg_prio; /* 840 4 */ /* XXX 4 bytes hole, try to pack */ struct timespec64 abs_timeout; /* 848 16 */ } mq_sendrecv; /* 824 40 */ struct { int oflag; /* 824 4 */ umode_t mode; /* 828 2 */ /* XXX 2 bytes hole, try to pack */ /* --- cacheline 13 boundary (832 bytes) --- */ struct mq_attr attr; /* 832 64 */ } mq_open; /* 824 72 */ struct { pid_t pid; /* 824 4 */ struct audit_cap_data cap; /* 828 32 */ } capset; /* 824 36 */ struct { int fd; /* 824 4 */ int flags; /* 828 4 */ } mmap; /* 824 8 */ struct { int argc; /* 824 4 */ } execve; /* 824 4 */ struct { char * name; /* 824 8 */ } module; /* 824 8 */ }; /* 824 72 */ /* --- cacheline 14 boundary (896 bytes) --- */ int fds[2]; /* 896 8 */ struct audit_proctitle proctitle; /* 904 16 */ /* size: 920, cachelines: 15, members: 46 */ /* sum members: 912, holes: 2, sum holes: 8 */ /* last cacheline: 24 bytes */ }; - Show where a struct was used, e.g. $ pahole -I vmlinux <SNIP> /* Used at: /home/acme/git/perf/init/main.c */ /* <1f4a5> /home/acme/git/perf/arch/x86/include/asm/orc_types.h:85 */ struct orc_entry { s16 sp_offset; /* 0 2 */ s16 bp_offset; /* 2 2 */ <SNIP> - Show offsets at union members (Arnaldo Carvalho de Melo, suggested by Matthew Wilcox): In complex structs with multiple complex unions figuring out the offset for a given union member is difficult, as one needs to figure out the union, go to the end of it to see the offset. This way, for instance, the Linux kernel's 'struct page' shows now as: struct page { long unsigned int flags; /* 0 8 */ union { struct address_space * mapping; /* 8 8 */ void * s_mem; /* 8 8 */ atomic_t compound_mapcount; /* 8 4 */ }; /* 8 8 */ union { long unsigned int index; /* 16 8 */ void * freelist; /* 16 8 */ }; /* 16 8 */ union { long unsigned int counters; /* 24 8 */ struct { union { atomic_t _mapcount; /* 24 4 */ unsigned int active; /* 24 4 */ struct { unsigned int inuse:16; /* 24:16 4 */ unsigned int objects:15; /* 24: 1 4 */ unsigned int frozen:1; /* 24: 0 4 */ }; /* 24 4 */ int units; /* 24 4 */ }; /* 24 4 */ atomic_t _refcount; /* 28 4 */ }; /* 24 8 */ }; /* 24 8 */ union { struct list_head lru; /* 32 16 */ struct dev_pagemap * pgmap; /* 32 8 */ struct { struct page * next; /* 32 8 */ int pages; /* 40 4 */ int pobjects; /* 44 4 */ }; /* 32 16 */ struct callback_head callback_head; /* 32 16 */ struct { long unsigned int compound_head; /* 32 8 */ unsigned int compound_dtor; /* 40 4 */ unsigned int compound_order; /* 44 4 */ }; /* 32 16 */ struct { long unsigned int __pad; /* 32 8 */ pgtable_t pmd_huge_pte; /* 40 8 */ }; /* 32 16 */ }; /* 32 16 */ union { long unsigned int private; /* 48 8 */ spinlock_t ptl; /* 48 4 */ struct kmem_cache * slab_cache; /* 48 8 */ }; /* 48 8 */ struct mem_cgroup * mem_cgroup; /* 56 8 */ /* size: 64, cachelines: 1, members: 7 */ }; - Search and use running kernel vmlinux when no file is passed (Arnaldo Carvalho de Melo) Now it is possible to use it just as: $ pahole -C sk_buff_head struct sk_buff_head { struct sk_buff * next; /* 0 8 */ struct sk_buff * prev; /* 8 8 */ __u32 qlen; /* 16 4 */ spinlock_t lock; /* 20 4 */ /* size: 24, cachelines: 1, members: 4 */ /* last cacheline: 24 bytes */ }; $ This will look at /sys/kernel/notes, find the running kernel build-id, and then search the usual locations (vmlinux, /lib/modules/`uname -r`/build/vmlinux, the debuginfo package paths, etc) to find the matching vmlinux with the DWARF info to use. Build-ids are now ubiquitous, so this shortens a the most common binary used. - Document 'pahole --hex' in the man page (Arnaldo Carvalho de Melo) This option shows offsets and sizes in hexadecimal, helping to correlate with reports using that notation. E.g.: $ pahole --hex -C sk_buff_head struct sk_buff_head { struct sk_buff * next; /* 0 0x8 */ struct sk_buff * prev; /* 0x8 0x8 */ __u32 qlen; /* 0x10 0x4 */ spinlock_t lock; /* 0x14 0x4 */ /* size: 24, cachelines: 1, members: 4 */ /* last cacheline: 24 bytes */ }; $ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
PreviousNext