Skip to content

fix flamegraph#524

Draft
turbocrime wants to merge 4 commits into
gungraun:mainfrom
turbocrime:fix/flamegraph
Draft

fix flamegraph#524
turbocrime wants to merge 4 commits into
gungraun:mainfrom
turbocrime:fix/flamegraph

Conversation

@turbocrime
Copy link
Copy Markdown

@turbocrime turbocrime commented Feb 7, 2026

Addresses #523

Summary

The current flamegraph output stacks all functions linearly by descending
inclusive cost — a design described as a quick
cost overview using inferno as a rendering backend.

This works as a sorted bar chart, but the flamegraph y-axis (call depth)
carries no meaning — sibling calls appear nested rather than branching. Since
callgrind output already contains caller-callee edges (cfn=, calls=,
per-call-site costs), we can do better.

This PR restores caller-callee edge tracking (present in 190dfd6, removed in
1f2e0bd) and uses DFS from the sentinel/root to emit folded stacks that
reflect actual call-graph structure.

Before / After

Given parent of total cost 90 Ir from calling inline_me (10 Ir), work_a (60 Ir), work_b (20 Ir):

fn parent() {
    inline_me(); // 10 Ir
    work_a(); // 60 Ir
    work_b(); // 20 Ir
}

Before (linear staircase):

above_parent 12345
above_parent;parent 90
above_parent;parent;work_a 60
above_parent;parent;work_a;work_b 20

After (branching call tree):

parent; 10
parent;work_a 60
parent;work_b 20

Now, parent is annotated with its own cost in the output graph. Rendering the graph compiles its total cost, so that remains available when viewing the graph.

Full example: https://github.com/turbocrime/gungraun-flamegraph-example

Github disables the fancier features of the SVG (interactivity, etc) when embedded, so please run the repro to get the full idea.

fixed callgrind bench_request case total Ir flamegraph

What changed

  • hashmap_parser.rs: parse_with_edges() exposes caller-callee edges via
    callback during parsing
  • flamegraph_parser.rs: replaces BinaryHeap + .windows(2) with a
    CallGraph struct and DFS stack emission; sentinel is used as DFS root when
    present
  • New test fixtures for branching and recursive call graphs

Limitations

Inferno forces lexical sorting when rendering the graph.

@turbocrime turbocrime marked this pull request as draft February 7, 2026 09:22
continue;
}
}
let roots: Vec<&Id> = if let Some(sentinel_key) = &self.costs.sentinel_key {
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using sentinel as dfs root as in 190dfd6


/// The unique `Id` identifying a function uniquely
#[derive(Debug, Hash, PartialEq, Eq, Clone, Serialize, Deserialize)]
#[derive(Debug, Hash, PartialEq, Eq, PartialOrd, Ord, Clone, Serialize, Deserialize)]
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PartialOrd, Ord added, replacing HeapElem's Ord to maintain deterministic output expected by tests

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sentinel is now dfs root, so output is the reachable subtree instead of all roots filtered by cost ceiling

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the old search produced one entry per function. dfs produces one entry per call-graph path

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new test case: parent calling multiple children

Comment on lines 135 to +154
impl CallgrindParser for HashMapParser {
type Output = CallgrindMap;

#[allow(clippy::too_many_lines)]
fn parse_single(&self, path: &Path) -> Result<(CallgrindProperties, Self::Output)> {
self.parse_with_edges(path, |_, _, _| {})
}
}

impl HashMapParser {
/// Like [`CallgrindParser::parse_single`] but invokes `on_call_edge(caller, callee, cost)`
/// for each caller→callee edge encountered.
#[allow(clippy::too_many_lines)]
pub fn parse_with_edges<F>(
&self,
path: &Path,
mut on_call_edge: F,
) -> Result<(CallgrindProperties, CallgrindMap)>
where
F: FnMut(&Id, &Id, &Metrics),
{
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only FlamegraphParser needs edges, so the parse_single signature is not modified, and it now delegates to parse_with_edges. i don't like this choice but i wanted to avoid larger diff.

@turbocrime turbocrime marked this pull request as ready for review February 7, 2026 11:45
@turbocrime turbocrime marked this pull request as draft February 7, 2026 22:45
@turbocrime
Copy link
Copy Markdown
Author

okay, i see total_flamegraph_map_from_parsed merges all per-thread/per-part
output files into a single map before generating stacks.

with the prior format this just produced imprecise bar widths; with the
call-graph approach, this might graph caller-callee relationships that don't
actually exist

there's no test case that specifically expresses this. the merging behavior is
about combining separate CallGraphs from different output files, and every
existing test asserts result.len() == 1, so it's explicitly not covered

data to handle multi-threaded/multi-part benchmarks should be available.
callgrind produces separate output files per thread/part, and the parser
extracts thread/part identity into CallgrindProperties, but the merging step
discards it.

so back to draft and i'm planning to follow up on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant