Skip to content

Allow disassembling and examining functions and globals#4000

Open
BryanKadzban wants to merge 3 commits into
probe-rs:masterfrom
BryanKadzban:improve-disassembly
Open

Allow disassembling and examining functions and globals#4000
BryanKadzban wants to merge 3 commits into
probe-rs:masterfrom
BryanKadzban:improve-disassembly

Conversation

@BryanKadzban

Copy link
Copy Markdown
Contributor

At ELF parse time, pull everything from .symtab to get the ::hXXXXX crate disambiguator suffixes, then pull everything from DWARF to get the correct generic arguments (symtab often drops info from the generic types or consts). Match the two up by address. Include inlined functions from DWARF as well, but only as a pointer to their containing concrete function; this way they can at least be requested, and the user can be told they're inlined.

Fold the DWARF ranges for inlined functions into their concrete parents. When we're done looking at DWARF, any inlined functions whose parents still have no ranges, and any concrete functions with no ranges themselves, get dropped, as they've been optimized out of the binary but left in the debug info.

At runtime, query the table of symbols by splitting the query and each symbol into a sequence of identifiers (alphanumeric-or-underscore), separated by anything else. If the sequence of identifiers matches, using full-string matching at each identifier, then the symbol is a candidate.

If there's more than one candidate, dump all of them and ask for a more fully specified query. Inlined symbols point to the container and don't allow disassembly or dumping.

Tested this on the RP2040_full_unwind.elf file in the repo, using the dummy probe code that always returns 0 for memory bytes. Disassembling main works, as does dumping one of the appropriately disambiguated vtable symbols. (As does dumping main, in fact.)

@BryanKadzban

Copy link
Copy Markdown
Contributor Author

(It took nearly a month to get my head around DWARF ... but I think this is a lot better for it :) )

@bugadani bugadani self-requested a review May 22, 2026 22:06
@bugadani

Copy link
Copy Markdown
Contributor

Sorry I forgot about this, give me a bit to review and play around with the PR, I'll try to get it merged this weekend.

At ELF parse time, drop all the globals into a vector that we can look
at later.  Split up the names on anything that isn't part of a rust
identifier, so we can do a sub-sequence match later more easily.  Do
this for both the symtab section (using rustc-demangle) and the DWARF
data (using the existing DIE iteration loop in UnitInfo).  De-duplicate
the entries from the two sources and drop the resulting list into the
DebugInfo struct so we can use it later.
Add an indication to the GdbNuf for whether an explicit count was
requested.  When the requested symbol is resolved, if an explicit count
was requested, use it (and the symbol's start address); if not, use the
symbol's stored Range<>s.

Add a method to DebugInfo to take in a (string) symbol query, which is
tokens (alphanumeric-or-underscore) separated by everything else. Tokens
match by sequence, full-string-matching each one, against the sequence
of tokens in each symbol's name.  So it works to omit whole levels of
the namespace hierarchy if they don't help narrow down to a single
symbol.

When dumping, if that query returns multiple entries, display all of
them and let the user add further restrictions to disambiguate.

Inlined function instances' names are disambiguated by the first address
of their actual code.  They're also tagged by the actual function that
calls (and contains) them, since they can't be disassembled directly but
their containing function can.

Global variables are also stored, but with only a single Range<> since
the compiler doesn't split them.

Both functions and variables can be either disassembled (/i) or dumped
(/x or equivalent).
@bugadani bugadani force-pushed the improve-disassembly branch from 846e0b3 to ebd4b09 Compare May 23, 2026 08:24
Comment on lines +64 to +67
f.write_str("z_can't happen: could not find a name attribute")?;
}
SymbolName::LinkNotFound(offset) => {
write!(f, "z_symbol name link not found at offset {offset:?}")?;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the z_ prefixes here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get the errors to show at the end of the symbol list when this Display impl is used to generate the symbols for a query. Now that I'm not seeing those happen anymore I can drop the prefix if you'd prefer

// first, so we de-duplicate, then drop them into a sorted vector at the
// end.
for symbol in object.symbols() {
let name = rustc_demangle::demangle(symbol.name()?).to_string();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not every program is written in rust, and we should respec that

Comment on lines +190 to +206
fn canonicalize_ranges(ranges: &mut Vec<Range<u64>>) {
ranges.sort_unstable_by_key(|r| r.start);
let mut merged_ranges = Vec::with_capacity(ranges.len());
let mut range_iter = ranges.iter_mut();
if let Some(prev) = range_iter.next() {
for r in range_iter {
if r.start <= prev.end {
prev.end = prev.end.max(r.end);
} else {
merged_ranges.push(r.clone());
prev.clone_from(r);
}
}
}
ranges.clear();
ranges.extend_from_slice(&merged_ranges);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks broken and will lose data. You expand a range, but store non-expanded ranges in merged_ranges. Then, you drop everything from ranges before re-adding some.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. The intent was to have prev refer to the entry in merged_ranges but when I made it clone_from, that broke it.

When I get back I'll fix this.

// A mapping from child die to parent die.
parents: HashMap<UnitOffset, UnitOffset>,
// A mapping from die to its full name.
full_names: HashMap<UnitOffset, String>,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a point in carrying this around? Is this ever used after processing_unit returns?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not, will look for sure in a couple days when we're done traveling.

} else {
let mut memory_result = vec![0u8; gdb_nuf.get_size()];
match target_core.core.read_8(address, &mut memory_result) {
let bytes: usize = if gdb_nuf.count_was_default {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to end up with multiple ranges here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but without addresses on the outputs I figured it was best to only display the first. Is there another way to go?

}

for symbol in &self.symbols {
if symbol.name.to_string() == query {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No biggie, just a bit wasteful to convert the name to string here, and a few lines down as well. Can we avoid it somehow for the comparison?

current_name.push_str(&parent_name);
}
if !parent_name.is_empty() && !name_opt.is_empty() {
current_name.push_str("::");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This again seems like rust-specific syntax. We have the option for language-specific handling somewhere in this codebase.

@BryanKadzban BryanKadzban left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're traveling for a while, will get full context again when we're home. Should only be a could more days.

// A mapping from child die to parent die.
parents: HashMap<UnitOffset, UnitOffset>,
// A mapping from die to its full name.
full_names: HashMap<UnitOffset, String>,

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not, will look for sure in a couple days when we're done traveling.

Comment on lines +64 to +67
f.write_str("z_can't happen: could not find a name attribute")?;
}
SymbolName::LinkNotFound(offset) => {
write!(f, "z_symbol name link not found at offset {offset:?}")?;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get the errors to show at the end of the symbol list when this Display impl is used to generate the symbols for a query. Now that I'm not seeing those happen anymore I can drop the prefix if you'd prefer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants