Skip to content

feat: Add base to track byte reduction due to relaxation#1527

Merged
lapla-cogito merged 1 commit into
wild-linker:mainfrom
lapla-cogito:relax_infra
Feb 14, 2026
Merged

feat: Add base to track byte reduction due to relaxation#1527
lapla-cogito merged 1 commit into
wild-linker:mainfrom
lapla-cogito:relax_infra

Conversation

@lapla-cogito

@lapla-cogito lapla-cogito commented Feb 11, 2026

Copy link
Copy Markdown
Member

part of #874

Some relaxations in architectures such as RISC-V and LoongArch actually reduce the generated code size. Although similar relaxations exist in x86_64 and AArch64, those implementations pad the shortened instructions with NOPs, so the overall symbol sizes remain unchanged. In contrast, the relaxations mentioned above truly shrink the symbols themselves, which means we must track how much each symbol’s size is reduced during the layout phase. A naive implementation would require running the layout process twice (the second pass accounting for size reductions in other symbols) which would clearly hurt performance.

Therefore, while this PR does not introduce any specific relaxation, it lays the groundwork for tracking size reductions caused by relaxations without requiring a second layout pass.

SectionRelaxDeltas maintains a list of offsets from which each section should remove a specific number of bytes. This information is used during the write phase to calculate offsets individually. The relaxation_deleted field within Section records the amount of byte reduction achieved for each section. From this data, we can calculate the actual size of each section, which can be used for the layout phase.

Comment thread libwild/src/layout.rs Outdated
@lapla-cogito

Copy link
Copy Markdown
Member Author

I don't think the CI failure is caused by this change: #1528

@lapla-cogito

lapla-cogito commented Feb 11, 2026

Copy link
Copy Markdown
Member Author

Note: I tried something similar before in #1317, but it didn’t work out. This PR is another attempt (and it took some time to implement, as you can see from the gap). Based on this, I'm currently working on implementing call relaxation for RISC-V locally, and it seems to be working quite well so far.

Comment thread libwild/src/layout.rs Outdated
@lapla-cogito lapla-cogito force-pushed the relax_infra branch 2 times, most recently from 6f32cae to 7d32bf6 Compare February 13, 2026 03:22

@davidlattimore davidlattimore left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately the performance loss is larger:

Benchmark 1 (671 runs): /home/david/save/zed/run-with env-rand /home/d/wild-builds/2026-02-13.cg1 --no-fork
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           537ms ± 13.2ms     494ms …  565ms          1 ( 0%)        0%
  peak_rss           3.47GB ± 1.21MB    3.47GB … 3.47GB          4 ( 1%)        0%
  cpu_cycles         21.7G  ±  182M     21.0G  … 22.4G           8 ( 1%)        0%
  instructions       19.4G  ± 51.6M     19.3G  … 19.6G          18 ( 3%)        0%
  cache_references    473M  ± 2.89M      466M  …  485M          14 ( 2%)        0%
  cache_misses        124M  ±  627K      123M  …  126M           2 ( 0%)        0%
  branch_misses      38.3M  ±  183K     37.1M  … 38.8M          23 ( 3%)        0%
Benchmark 2 (639 runs): /home/david/save/zed/run-with env-rand /home/d/wild-builds/2026-02-13.cg1.relaxation-tracking --no-fork
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           564ms ± 14.6ms     516ms …  592ms          1 ( 0%)        💩+  5.0% ±  0.3%
  peak_rss           3.59GB ± 1.06MB    3.59GB … 3.59GB          8 ( 1%)        💩+  3.5% ±  0.0%
  cpu_cycles         22.2G  ±  172M     21.5G  … 22.8G          11 ( 2%)        💩+  2.1% ±  0.1%
  instructions       19.6G  ± 48.9M     19.5G  … 19.9G           7 ( 1%)        💩+  1.3% ±  0.0%
  cache_references    496M  ± 2.83M      489M  …  507M           8 ( 1%)        💩+  4.8% ±  0.1%
  cache_misses        128M  ±  629K      126M  …  130M           4 ( 1%)        💩+  2.7% ±  0.1%
  branch_misses      39.5M  ±  172K     38.6M  … 40.1M          16 ( 3%)        💩+  3.2% ±  0.1%

Memory consumption is also up. I assume this is because we're now storing an Option<Vec<..>> for every input section.

@lapla-cogito lapla-cogito force-pushed the relax_infra branch 2 times, most recently from 17eed65 to 2bae671 Compare February 14, 2026 03:08
@lapla-cogito

lapla-cogito commented Feb 14, 2026

Copy link
Copy Markdown
Member Author

Implemented sparse maps for each object file to track byte reduction. While this is clearly more memory-efficient than using Vec per section, it also means each object file will incur an additional 24 bytes of struct size, even for architectures where byte reduction tracking isn't necessary. But I think this trade-off is acceptable.

@lapla-cogito lapla-cogito force-pushed the relax_infra branch 3 times, most recently from bfc2dda to 7a983ef Compare February 14, 2026 03:53
Comment thread libwild/src/elf_writer.rs Outdated

if let Some(hdr_out) = table_writer.take_eh_frame_hdr_entry() {
let frame_ptr = (section_address + offset_in_section) as i64
// When relaxation has deleted bytes fromq the target section, the

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo - 'fromq'

Comment thread linker-utils/src/relaxation.rs Outdated
let n = self.deltas.len();
let mut lo = 0usize;
let mut hi = n;
while lo < hi {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it work to use binary_search_by_key here? If not, could you add a comment to the code saying why.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since output_pos is strictly monotonically increasing because the input offsets are strictly ascending and the deletion ranges don't overlap, we didn't actually need to use a binary search here in the first place (the remnants of the first approach's implementation were still present).

@davidlattimore

Copy link
Copy Markdown
Member

Performance looks good now. Thanks :)

@lapla-cogito lapla-cogito merged commit c9804de into wild-linker:main Feb 14, 2026
20 checks passed
@lapla-cogito lapla-cogito deleted the relax_infra branch February 14, 2026 09:20
@marxin

marxin commented Feb 16, 2026

Copy link
Copy Markdown
Collaborator

Therefore, while this PR does not introduce any specific relaxation, it lays the groundwork for tracking size reductions caused by relaxations without requiring a second layout pass.

Just wanted to mention we can reduce the binary size, but we must obey the align relocations, e.g. R_RISCV_ALIGN.
@lapla-cogito can you please create a PR that will respect the alignment in the tracking?

@lapla-cogito

Copy link
Copy Markdown
Member Author

Since I've merged #1552, I'll begin addressing R_RISCV_ALIGN-related things soon. However, the current code still outputs alignment NOPs specified by R_RISCV_ALIGN at their original size, which should not cause alignment issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants