Record kernel samples by DaniPopes · Pull Request #736 · mstange/samply

DaniPopes · 2025-12-21T00:35:22Z

Kernel samples are not recorded nor decoded during samply record. This PR is mainly two commits:

18611d3: enables PERF_SAMPLE_CALLCHAIN and decodes kernel stack frames from the callchain in the sample events
4ef1638: register kernel and modules during profiler setup, just like in perf, so that Converter is notified of the kernel symbols

~~Based on #734.~~

ishitatsuyuki · 2025-12-21T12:04:54Z

+                    pid: -1,
+                    tid: 0,
+                    address: module.base_address,
+                    length: module.size,


A surprising coincidence, but I have been working on something similar locally. I've found an issue with using the extents from /proc/modules. Details below:

When CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC is enabled, .text and .data goes into separate memory regions. However /proc/modules reports start of .text as the basis but the module_total_size (code). So combining these two values will get you overlapping module ranges, and in samply overlapping module ranges will evict the previous one, causing symbolization to fail.

This probably needs to be taken upstream but until then I think we might need to clamp the module extents so that they don't overlap.

Sorry, I got confused. On x64, CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC is not enabled, so it's not due to it. The correct explanation seems to be that data symbols precedes .text so using total size is still wrong.

I pushed b8822c6, is this what you had in mind?

Thanks, I'll test when I'm back at desk.

Okay, the kernel module bases seem to resolve correctly, but symbolization for modules does not work. You probably need to parse module entries from kallsyms as well. Do you plan to do it? Otherwise I can see if I can pick up some of my local changes on top.

https://share.firefox.dev/4pPtAId

I quickly created a module symbol parser in 9eaf3fa. It seems to symbolize correctly on my end.

Nice, thanks, cherry-picked it

Mimics `perf` so the kernel and modules' symbols are registered in `Converter`.

mstange · 2025-12-22T02:57:25Z

I'm curious how you're running samply in a way that gives you useful kernel symbols here. Did you change your /proc/sys/kernel/kptr_restrict setting? Or are you running samply as root?

ishitatsuyuki · 2025-12-22T03:08:05Z

On my end, I have both kptr_restrict and perf_event_paranoid relaxed. That allows reading kallsyms without root.

But in any case root should work.

https://github.com/torvalds/linux/blob/master/kernel/ksyms_common.c

mstange · 2026-03-13T17:01:14Z

@@ -311,6 +311,10 @@ impl PerfBuilder {
            attr.sample_type |= PERF_SAMPLE_STACK_USER;


I think we should also set the EXCLUDE_CALLCHAIN_USER flag bit, because we only want one source of user stack, either the stack bytes (with user-space stack walking) or the call chain (from kernel-space stack walking).

That makes sense.

Separately, I had an idea that we could use both the user-space stack we get from the kernel and merge it with our own obtained from DWARF to improve accuracy, but I couldn't find an easy way to do this.

Err your current patch is setting PERF_ATTR_FLAG_EXCLUDE_CALLCHAIN_KERNEL but I was asking for EXCLUDE_CALLCHAIN_USER. So this isn't really achieving the thing I asked for - having only one source of user stack, either the stack bytes or the user call chain.

Always request PERF_SAMPLE_CALLCHAIN with user frames so the kernel walks the frame pointer chain at sample time. When DWARF unwinding is truncated, splice the remaining user FP frames from the kernel's callchain into the merged stack, giving complete stacks for binaries compiled with frame pointers.

mstange · 2026-04-15T18:04:35Z

+            // Find where to splice: match the last real DWARF frame's address in the
+            // FP callchain, then append everything deeper from the FP walk.
+            let last_dwarf_addr = self.dwarf[..dwarf_len].last().map(|f| f.address());
+            let splice_idx = last_dwarf_addr
+                .and_then(|addr| fp_stack.iter().position(|f| f.address() == addr))
+                .map(|i| i + 1)
+                .unwrap_or(fp_stack.len());
+            if splice_idx < fp_stack.len() {
+                self.merged.extend_from_slice(&fp_stack[splice_idx..]);
+            }


I'm not sure about this. With recursion, you may end up with merged stacks that have extra or missing recursive parts. It would be nice to match the two stacks up using the per-frame stack-pointer value, unfortunately the callchain from the kernel doesn't give us the stack pointers, as far as I know.

mstange

The rest looks good. I just think we shouldn't mess with two user stacks here. Can you change this PR so that the user stack comes either from the callchain info or from our dwarf unwinding? Then you don't need the SampleStack struct either. And then we can discuss the user stack stitching in a separate PR.

ishitatsuyuki reviewed Dec 21, 2025

View reviewed changes

DaniPopes added 2 commits December 22, 2025 03:47

Enable callchain to get kernel frames

9ab2980

Emit synthetic mmap events for kernel and modules

e892659

Mimics `perf` so the kernel and modules' symbols are registered in `Converter`.

DaniPopes force-pushed the sample-kernel branch from f25cd32 to 133b718 Compare December 22, 2025 02:49

DaniPopes marked this pull request as ready for review December 22, 2025 02:49

Fix Windows build

c9de5c9

DaniPopes force-pushed the sample-kernel branch from 133b718 to c9de5c9 Compare December 22, 2025 02:55

DaniPopes force-pushed the sample-kernel branch from e01b5c6 to e31872e Compare December 22, 2025 03:16

Clamp kernel module ranges

b8822c6

DaniPopes force-pushed the sample-kernel branch from e31872e to b8822c6 Compare December 22, 2025 03:18

ishitatsuyuki and others added 6 commits December 22, 2025 08:26

Parse module symbols

978bd6a

Remove unused error

98f01a0

Adjust debug assertion

82b5609

Log KernelSymbolsError

5146516

Don't panic on CodeId parse error

ff3070a

Fix empty kernel module build IDs

a403b5c

mstange reviewed Mar 13, 2026

View reviewed changes

DaniPopes added 3 commits March 13, 2026 22:33

Set EXCLUDE_CALLCHAIN_USER

61afc14

Merge branch 'main' into sample-kernel

aa652a8

DaniPopes mentioned this pull request Mar 27, 2026

Fall back to FP callchain when DWARF unwinding truncates #768

Open

DaniPopes added 3 commits March 27, 2026 06:20

Merge branch 'main' into sample-kernel

6949478

fmt

a43191c

Merge branch 'main' into sample-kernel

9f3d469

mstange reviewed Apr 15, 2026

View reviewed changes

DaniPopes added 2 commits April 17, 2026 01:20

Remove SampleStack

ccd790e

Merge branch 'main' into sample-kernel

9a43a73

Merge branch 'main' into sample-kernel

3615b41

		@@ -311,6 +311,10 @@ impl PerfBuilder {
		attr.sample_type \|= PERF_SAMPLE_STACK_USER;

Conversation

DaniPopes commented Dec 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DaniPopes Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mstange commented Dec 22, 2025

Uh oh!

ishitatsuyuki commented Dec 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mstange left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DaniPopes commented Dec 21, 2025 •

edited

Loading

DaniPopes Dec 22, 2025 •

edited

Loading