Allow disassembling and examining functions and globals#4000
Allow disassembling and examining functions and globals#4000BryanKadzban wants to merge 3 commits into
Conversation
|
(It took nearly a month to get my head around DWARF ... but I think this is a lot better for it :) ) |
|
Sorry I forgot about this, give me a bit to review and play around with the PR, I'll try to get it merged this weekend. |
At ELF parse time, drop all the globals into a vector that we can look at later. Split up the names on anything that isn't part of a rust identifier, so we can do a sub-sequence match later more easily. Do this for both the symtab section (using rustc-demangle) and the DWARF data (using the existing DIE iteration loop in UnitInfo). De-duplicate the entries from the two sources and drop the resulting list into the DebugInfo struct so we can use it later.
Add an indication to the GdbNuf for whether an explicit count was requested. When the requested symbol is resolved, if an explicit count was requested, use it (and the symbol's start address); if not, use the symbol's stored Range<>s. Add a method to DebugInfo to take in a (string) symbol query, which is tokens (alphanumeric-or-underscore) separated by everything else. Tokens match by sequence, full-string-matching each one, against the sequence of tokens in each symbol's name. So it works to omit whole levels of the namespace hierarchy if they don't help narrow down to a single symbol. When dumping, if that query returns multiple entries, display all of them and let the user add further restrictions to disambiguate. Inlined function instances' names are disambiguated by the first address of their actual code. They're also tagged by the actual function that calls (and contains) them, since they can't be disassembled directly but their containing function can. Global variables are also stored, but with only a single Range<> since the compiler doesn't split them. Both functions and variables can be either disassembled (/i) or dumped (/x or equivalent).
846e0b3 to
ebd4b09
Compare
| f.write_str("z_can't happen: could not find a name attribute")?; | ||
| } | ||
| SymbolName::LinkNotFound(offset) => { | ||
| write!(f, "z_symbol name link not found at offset {offset:?}")?; |
There was a problem hiding this comment.
What are the z_ prefixes here?
There was a problem hiding this comment.
To get the errors to show at the end of the symbol list when this Display impl is used to generate the symbols for a query. Now that I'm not seeing those happen anymore I can drop the prefix if you'd prefer
| // first, so we de-duplicate, then drop them into a sorted vector at the | ||
| // end. | ||
| for symbol in object.symbols() { | ||
| let name = rustc_demangle::demangle(symbol.name()?).to_string(); |
There was a problem hiding this comment.
Not every program is written in rust, and we should respec that
| fn canonicalize_ranges(ranges: &mut Vec<Range<u64>>) { | ||
| ranges.sort_unstable_by_key(|r| r.start); | ||
| let mut merged_ranges = Vec::with_capacity(ranges.len()); | ||
| let mut range_iter = ranges.iter_mut(); | ||
| if let Some(prev) = range_iter.next() { | ||
| for r in range_iter { | ||
| if r.start <= prev.end { | ||
| prev.end = prev.end.max(r.end); | ||
| } else { | ||
| merged_ranges.push(r.clone()); | ||
| prev.clone_from(r); | ||
| } | ||
| } | ||
| } | ||
| ranges.clear(); | ||
| ranges.extend_from_slice(&merged_ranges); | ||
| } |
There was a problem hiding this comment.
This looks broken and will lose data. You expand a range, but store non-expanded ranges in merged_ranges. Then, you drop everything from ranges before re-adding some.
There was a problem hiding this comment.
Oh, right. The intent was to have prev refer to the entry in merged_ranges but when I made it clone_from, that broke it.
When I get back I'll fix this.
| // A mapping from child die to parent die. | ||
| parents: HashMap<UnitOffset, UnitOffset>, | ||
| // A mapping from die to its full name. | ||
| full_names: HashMap<UnitOffset, String>, |
There was a problem hiding this comment.
Is there a point in carrying this around? Is this ever used after processing_unit returns?
There was a problem hiding this comment.
Probably not, will look for sure in a couple days when we're done traveling.
| } else { | ||
| let mut memory_result = vec![0u8; gdb_nuf.get_size()]; | ||
| match target_core.core.read_8(address, &mut memory_result) { | ||
| let bytes: usize = if gdb_nuf.count_was_default { |
There was a problem hiding this comment.
Is it possible to end up with multiple ranges here?
There was a problem hiding this comment.
Yes, but without addresses on the outputs I figured it was best to only display the first. Is there another way to go?
| } | ||
|
|
||
| for symbol in &self.symbols { | ||
| if symbol.name.to_string() == query { |
There was a problem hiding this comment.
No biggie, just a bit wasteful to convert the name to string here, and a few lines down as well. Can we avoid it somehow for the comparison?
| current_name.push_str(&parent_name); | ||
| } | ||
| if !parent_name.is_empty() && !name_opt.is_empty() { | ||
| current_name.push_str("::"); |
There was a problem hiding this comment.
This again seems like rust-specific syntax. We have the option for language-specific handling somewhere in this codebase.
BryanKadzban
left a comment
There was a problem hiding this comment.
We're traveling for a while, will get full context again when we're home. Should only be a could more days.
| // A mapping from child die to parent die. | ||
| parents: HashMap<UnitOffset, UnitOffset>, | ||
| // A mapping from die to its full name. | ||
| full_names: HashMap<UnitOffset, String>, |
There was a problem hiding this comment.
Probably not, will look for sure in a couple days when we're done traveling.
| f.write_str("z_can't happen: could not find a name attribute")?; | ||
| } | ||
| SymbolName::LinkNotFound(offset) => { | ||
| write!(f, "z_symbol name link not found at offset {offset:?}")?; |
There was a problem hiding this comment.
To get the errors to show at the end of the symbol list when this Display impl is used to generate the symbols for a query. Now that I'm not seeing those happen anymore I can drop the prefix if you'd prefer
At ELF parse time, pull everything from .symtab to get the ::hXXXXX crate disambiguator suffixes, then pull everything from DWARF to get the correct generic arguments (symtab often drops info from the generic types or consts). Match the two up by address. Include inlined functions from DWARF as well, but only as a pointer to their containing concrete function; this way they can at least be requested, and the user can be told they're inlined.
Fold the DWARF ranges for inlined functions into their concrete parents. When we're done looking at DWARF, any inlined functions whose parents still have no ranges, and any concrete functions with no ranges themselves, get dropped, as they've been optimized out of the binary but left in the debug info.
At runtime, query the table of symbols by splitting the query and each symbol into a sequence of identifiers (alphanumeric-or-underscore), separated by anything else. If the sequence of identifiers matches, using full-string matching at each identifier, then the symbol is a candidate.
If there's more than one candidate, dump all of them and ask for a more fully specified query. Inlined symbols point to the container and don't allow disassembly or dumping.
Tested this on the RP2040_full_unwind.elf file in the repo, using the dummy probe code that always returns 0 for memory bytes. Disassembling main works, as does dumping one of the appropriately disambiguated vtable symbols. (As does dumping main, in fact.)