Skip to content

feat: support SEGMENT_START function in Linker Script#1851

Merged
davidlattimore merged 5 commits into
wild-linker:mainfrom
Blazearth:feat-segment-start-support
May 2, 2026
Merged

feat: support SEGMENT_START function in Linker Script#1851
davidlattimore merged 5 commits into
wild-linker:mainfrom
Blazearth:feat-segment-start-support

Conversation

@Blazearth

Copy link
Copy Markdown
Contributor

Implements SEGMENT_START("name", default) in linker script parsing and resolution. Fixes linking of binaries that use __executable_start = SEGMENT_START("text", 0) in their linker scripts.

Closes #1098

@Blazearth Blazearth force-pushed the feat-segment-start-support branch 2 times, most recently from 0b22961 to edc82d7 Compare April 18, 2026 00:37

@lapla-cogito lapla-cogito left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please update the status of SEGMENT_START in LINKER_SCRIPT_SUPPORT.md.

Comment thread libwild/src/layout_rules.rs Outdated

// Try parsing as a full expression first.
let mut input = BStr::new(value);
if let Ok(expr) = crate::linker_script::parse_expression_pub(&mut input) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After parse_expression_pub() successfully parses a prefix of the value (e.g. SEGMENT_START), any trailing tokens in the same expression are silently ignored because the caller does not verify that the input was fully consumed.

Comment thread libwild/src/layout_rules.rs Outdated
}
Expression::Number(n) => Some(SymbolPlacement::DefsymAbsolute(n)),
// For other expressions fall through to the legacy path below.
_ => None,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using legacy paths results in double parsing by both parse_expression_pub() and parse_symbol_expression(). This is inherently inefficient, and if the two parsing results conflict, it creates a breeding ground for difficult-to-debug bugs. Ideally, we should consolidate this into a single parsing path.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out , you're right that the fallback introduced double parsing and could lead to inconsistent interpretations.
I have removed the legacy parse_symbol_expression fallback entirely. Now uses only single parser.

Comment thread libwild/src/layout.rs Outdated
"text" => def.is_executable() && !def.is_writable(),
"data" | "bss" => def.is_writable() && !def.is_executable(),
"rodata" => !def.is_writable() && !def.is_executable(),
_ => false,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be better to emit a warning and then ignore the unknown segment name rather than silently ignoring it.

@Blazearth Blazearth force-pushed the feat-segment-start-support branch from f1f9c31 to 9ac2117 Compare April 19, 2026 11:01
lapla-cogito
lapla-cogito previously approved these changes Apr 20, 2026

@lapla-cogito lapla-cogito left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some questions about the test, but otherwise it seems good.

@@ -0,0 +1,13 @@
//#LinkerScript:linker-script-segment-start.ld
//#RunEnabled:false
//#DiffEnabled:false

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should DiffEnabled:false be necessary? The diff detection by linker-diff should be verified unless there's a valid reason.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DiffEnabled:false was originally added because the test was linked as a shared object, which introduced differences (e.g relocations/GOT) that made the diff noisy.

//#RunEnabled:false
//#DiffEnabled:false
//#Mode:dynamic
//#LinkArgs:-shared -z now

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you're linking this as a shared library? IIRC, SEGMENT_START is typically used for regular executables more often. If you link this test as a regular executable, you won't need RunEnabled:false either.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, linking as a shared object was also not ideal here, I took the easy path. I’ll rework this test to use a regular executable.

@lapla-cogito lapla-cogito dismissed their stale review April 20, 2026 03:56

Sorry, I accidentally set this to approve.

Comment thread libwild/src/layout_rules.rs Outdated
let default =
eval_constant_expr(default_expr).unwrap_or(0);
let name_str = std::str::from_utf8(name).map_err(|_| {
crate::error!("SEGMENT_START: segment name is not valid UTF-8")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably be inclined to just keep the name as &[u8]. That way we don't need to worry about UTF-8 validation. When we're comparing it with things, we can just compare it with b"text" etc. That's how names are handled throughout most of the rest of the linker.

Comment thread libwild/src/layout_rules.rs Outdated
/// Evaluate a linker script expression that must be a compile-time constant.
/// Returns `Err` for any expression that requires runtime context (symbols, location counter,
/// etc.).
fn eval_constant_expr(expr: &crate::linker_script::Expression<'_>) -> crate::error::Result<u64> {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this probably belongs in expression_eval.rs

@@ -147,10 +249,7 @@ impl<'data> LayoutRulesBuilder<'data> {
.with_hidden(provide.hidden),
);
} else if let linker_script::Command::SymbolDefinition { name, value } = cmd {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If value is a string that needs parsing, then we should probably parse it when we parse the rest of the linker script. That way if there's an error while parsing, we can report the line number in the linker script where the error occurred.

@@ -0,0 +1,8 @@
__executable_start = SEGMENT_START("text", 0);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be good to define a symbol for each of the supported segment types, then have some code (in _start) that checks that those values appear correct relative to variables in the respective segments.

@Blazearth Blazearth marked this pull request as draft April 20, 2026 10:55
@Blazearth Blazearth force-pushed the feat-segment-start-support branch 2 times, most recently from bd7efe7 to b143984 Compare April 20, 2026 17:58
@Blazearth

Copy link
Copy Markdown
Contributor Author
//#DiffIgnore:segment.LOAD.RWX.alignment
//#DiffIgnore:segment.LOAD.RX.alignment

Added these in the test since ld produces a single RWE LOAD segment when no PHDRS directive is present in the linker script. Wild always uses separate RO/RX/RW segments. Took me while to resolve the test.
I have addressed all the issues.

@Blazearth Blazearth marked this pull request as ready for review April 20, 2026 18:20
Comment thread libwild/src/layout.rs Outdated
Comment on lines +3181 to +3183
// rodata lives in a dedicated RO segment when one exists, but in a typical Linux
// executable it shares the RX segment with .text. Match any non-writable loadable
// segment so we find whichever one actually contains read-only data.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right to me. Maybe executables on Linux used to put RO data into the RX segment, but modern executables don't do this since it would make the binary slightly less secure. But in any case, what's important is what Wild does, which is emit a separate read-only non-executable segment.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure i will revert it back.

Comment thread libwild/src/layout.rs Outdated
fn segment_name_matches(name: &[u8], def: impl crate::platform::ProgramSegmentDef) -> bool {
match name {
b"text" => def.is_executable() && !def.is_writable(),
b"data" | b"bss" => def.is_writable() && !def.is_executable(),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does GNU ld return when you request the start of BSS? Does it return the start of the combined data/bss segment or does it return where bss starts within that segment?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing with GNU ld, SEGMENT_START("bss", ...) returns the start of the loadable writable segment, not the start of the .bss section within that segment, since .data and .bss are typically placed in the same LOAD segment.

runtime_init();

/* text_start must be <= _start (both in the text segment) */
if (ptr_to_int(&text_start) > ptr_to_int(&_start)) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the linker script explicitly says to put .text at 0x600000, do you think we should confirm that it's at that address?

Comment thread libwild/src/layout.rs Outdated
}

fn is_known_segment_name(name: &[u8]) -> bool {
matches!(name, b"text" | b"data" | b"bss" | b"rodata")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a shame to duplicate the list here and in the function above. What about adding an enum SegmentMatcher, then parsing that from the string. Bonus if we can parse when we parse the linker script so that that an unknown value gets reported with the correct line number.

I guess that might result in an unknown segment name being an error rather than a warning. That's possibly not a bad thing though.

@@ -0,0 +1,16 @@
ENTRY(_start)

text_start = SEGMENT_START("text", 0x600000);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value here is identical to the . = 0x600000 used for the .text section below. That makes it hard to determine which was actually used. I tried replacing the default values for all four symbols and it appears that GNU ld ended up using those default values for all of them - e.g. it didn't actually use the start of the text segment at all. It could be good to do some experimentation to determine when the default value gets used and when it doesn't then try to match that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly you forgot this comment? It looks like it's still the case that the linker script starts .text at the same address as the default value. I'd suggest Changing the default values for all four SEGMENT_START calls to distinct values - e.g. 0x10, 0x11, 0x12, 0x13.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I forgot about that. I hope its all fine now.

#include <stddef.h>

#include "../common/runtime.h"
#include "ptr_black_box.h"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change this to "../common/ptr_black_box.h"? I'll be doing a PR soon to update the includes in the other files and also to change the test runner to not have "../common" in the include path.

@Blazearth Blazearth marked this pull request as draft April 21, 2026 13:57
@Blazearth

Copy link
Copy Markdown
Contributor Author

The default value here is identical to the . = 0x600000 used for the .text section below. That makes it hard to determine which was actually used. I tried replacing the default values for all four symbols and it appears that GNU ld ended up using those default values for all of them - e.g. it didn't actually use the start of the text segment at all. It could be good to do some experimentation to determine when the default value gets used and when it doesn't then try to match that.

Hey @davidlattimore , I ran a series of experiments to understand the real behavior before deciding how Wild should match it.

What the docs say

The official GNU ld docs state:

SEGMENT_START(segment, default) — If an explicit value has already been given for this segment with a command-line -T option then that value will be returned, otherwise the value will be default.

Case 1 — no -T, default differs from actual layout:

text_start = SEGMENT_START("text", 0x1);
. = 0x600000;  /* .text actually lands here */
Result: text_start = 0x1 (the default, not the actual segment address)

Case 2 — -Ttext=0x700000 override:

text_start = SEGMENT_START("text", 0x1);
. = 0x600000;
ld -Ttext=0x700000 ...
Result: text_start = 0x700000 (the CLI override wins, layout moves too)

Case 3 — no -T, default happens to match layout (our current test):

text_start = SEGMENT_START("text", 0x600000);
. = 0x600000;
Result: text_start = 0x600000 — ambiguous, can't tell if it used the default or the segment

Case 4 — no -T, default is different from layout:

text_start = SEGMENT_START("text", 0x500000);
. = 0x600000;
Result: text_start = 0x500000 — default again, not the actual segment base

In these cases, GNU ld appears to return the -T* override if provided, otherwise the default value — even when the actual segment lands elsewhere in the layout. This suggests SEGMENT_START does not derive its value from scanning PT_LOAD segments, at least in these scenarios.

What Wild currently does

Wild scans PT_LOAD segments by flag combinations (RX for text, RO for rodata, RW for data/bss) and returns the actual segment base address. This is more "layout-aware" but fundamentally different from GNU ld's behavior.

Now ,there are two options:

Option A — Match GNU ld strictly: SEGMENT_START only returns a value when the corresponding -T flag was passed on the command line (e.g. -Ttext, -Tdata, -Tbss). Otherwise return the default. Wild would need to store -T overrides and look them up here.

Option B — Keep Wild's current behavior: scan actual PT_LOAD segments by flags and return their base address. More useful in practice, but diverges from GNU ld when no -T is passed.

@davidlattimore

Copy link
Copy Markdown
Member

I think for things like this, we want to match GNU ld or lld's behaviour, especially if both GNU ld and lld have the same behaviour.

@Blazearth Blazearth force-pushed the feat-segment-start-support branch from ce9d62d to f263564 Compare April 25, 2026 17:40
@Blazearth

Copy link
Copy Markdown
Contributor Author

I have changed it to match the behaviour of GNU ld where it overrides the address with -T for bss, data and text. As for rodata it always returns the default.
The test now has two configs to verify both paths: one without any -T flags (all four symbols return their linker script defaults), and one with -Ttext/-Tdata/-Tbss overrides (verifying the override values are returned, while rodata still returns its default).

@Blazearth Blazearth marked this pull request as ready for review April 25, 2026 17:52

@davidlattimore davidlattimore left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried that it appears this implements parsing of -Ttext= etc and hooks it up to SEGMENT_START, but doesn't actually affect the actual start of the segments. That would mean that someone linking with -Ttext= would now not get an error, but would get a binary where the text segment didn't start where they said it needed to.

Ideally, we'd fully support -Ttext= etc before we'd support SEGMENT_START. In that regard, the best thing to do would be to separate -Ttext= support out into a separate PR, hook it up to layout so that it affects the address of the text segment and make sure that part of it is tested, submit that PR, then proceed with the rest of this PR.

But I also don't want to make you drag this PR out longer than you want to, so I'm open to other options. e.g. we could split the other way. i.e. move the parsing of -Ttext= etc to a later PR, including the part of the test you've got that ensures that works with SEGMENT_START. That would mean that this PR would support SEGMENT_START, but only where it returns the default value. The later PR could add support for -Ttext= etc, making it both affect the actual segment starts and the return value from SEGMENT_START.

Comment thread libwild/src/args/elf.rs
Comment on lines +1276 to +1278
// -Ttext=ADDR, -Tdata=ADDR, -Tbss=ADDR are segment start overrides,
// not linker script paths. Handle them here since they share the -T prefix.
// The prefix handler gives us the part after "-T", which may be:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for figuring this out. I hadn't realised this particular oddity in GNU linker parsing semantics.

@Blazearth

Copy link
Copy Markdown
Contributor Author

I'm a bit worried that it appears this implements parsing of -Ttext= etc and hooks it up to SEGMENT_START, but doesn't actually affect the actual start of the segments. That would mean that someone linking with -Ttext= would now not get an error, but would get a binary where the text segment didn't start where they said it needed to.

First of all, sorry for overlooking that — you're right that -Ttext= currently only affects the SEGMENT_START return value and doesn't actually move the segment in the output binary.

Ideally, we'd fully support -Ttext= etc before we'd support SEGMENT_START. In that regard, the best thing to do would be to separate -Ttext= support out into a separate PR, hook it up to layout so that it affects the address of the text segment and make sure that part of it is tested, submit that PR, then proceed with the rest of this PR.

I'd like to try and go with this option first — implement proper -Ttext/-Tdata/-Tbss support that actually affects the segment layout, then this PR's with-T-overrides test will be fully correct. I'll open a separate PR for that.

If I find it's too complex to get right, I'll fall back to Option B — strip the -Ttext parts from this PR so it only supports SEGMENT_START returning the default value, and leave the override support for a later PR.

@Blazearth Blazearth force-pushed the feat-segment-start-support branch 2 times, most recently from cbd41b4 to acbfb3c Compare May 1, 2026 23:00
@Blazearth Blazearth marked this pull request as ready for review May 1, 2026 23:20
@Blazearth

Copy link
Copy Markdown
Contributor Author

Now SEGMENT_START move the section at the correct place in output binary with -Ttext override from #1877

@Blazearth Blazearth force-pushed the feat-segment-start-support branch from acbfb3c to ffe5a62 Compare May 2, 2026 00:21

@davidlattimore davidlattimore left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing this :)

@davidlattimore davidlattimore merged commit 70600b4 into wild-linker:main May 2, 2026
38 of 48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

linker-scripts: Support SEGMENT_START

3 participants