feat: support SEGMENT_START function in Linker Script by Blazearth · Pull Request #1851 · wild-linker/wild

Blazearth · 2026-04-18T00:24:38Z

Implements SEGMENT_START("name", default) in linker script parsing and resolution. Fixes linking of binaries that use __executable_start = SEGMENT_START("text", 0) in their linker scripts.

Closes #1098

lapla-cogito

Also, please update the status of SEGMENT_START in LINKER_SCRIPT_SUPPORT.md.

lapla-cogito · 2026-04-19T02:42:03Z

+
+    // Try parsing as a full expression first.
+    let mut input = BStr::new(value);
+    if let Ok(expr) = crate::linker_script::parse_expression_pub(&mut input) {


After parse_expression_pub() successfully parses a prefix of the value (e.g. SEGMENT_START), any trailing tokens in the same expression are silently ignored because the caller does not verify that the input was fully consumed.

lapla-cogito · 2026-04-19T02:42:06Z

+            }
+            Expression::Number(n) => Some(SymbolPlacement::DefsymAbsolute(n)),
+            // For other expressions fall through to the legacy path below.
+            _ => None,


Using legacy paths results in double parsing by both parse_expression_pub() and parse_symbol_expression(). This is inherently inefficient, and if the two parsing results conflict, it creates a breeding ground for difficult-to-debug bugs. Ideally, we should consolidate this into a single parsing path.

Thanks for pointing this out , you're right that the fallback introduced double parsing and could lead to inconsistent interpretations.
I have removed the legacy parse_symbol_expression fallback entirely. Now uses only single parser.

lapla-cogito · 2026-04-19T02:42:14Z

+        "text" => def.is_executable() && !def.is_writable(),
+        "data" | "bss" => def.is_writable() && !def.is_executable(),
+        "rodata" => !def.is_writable() && !def.is_executable(),
+        _ => false,


It may be better to emit a warning and then ignore the unknown segment name rather than silently ignoring it.

lapla-cogito

I have some questions about the test, but otherwise it seems good.

lapla-cogito · 2026-04-20T03:53:25Z

@@ -0,0 +1,13 @@
+//#LinkerScript:linker-script-segment-start.ld
+//#RunEnabled:false
+//#DiffEnabled:false


Why should DiffEnabled:false be necessary? The diff detection by linker-diff should be verified unless there's a valid reason.

DiffEnabled:false was originally added because the test was linked as a shared object, which introduced differences (e.g relocations/GOT) that made the diff noisy.

lapla-cogito · 2026-04-20T03:53:27Z

+//#RunEnabled:false
+//#DiffEnabled:false
+//#Mode:dynamic
+//#LinkArgs:-shared -z now


Is there a reason you're linking this as a shared library? IIRC, SEGMENT_START is typically used for regular executables more often. If you link this test as a regular executable, you won't need RunEnabled:false either.

You are right, linking as a shared object was also not ideal here, I took the easy path. I’ll rework this test to use a regular executable.

Sorry, I accidentally set this to approve.

davidlattimore · 2026-04-20T06:11:31Z

+                                                let default =
+                                                    eval_constant_expr(default_expr).unwrap_or(0);
+                                                let name_str = std::str::from_utf8(name).map_err(|_| {
+                                                    crate::error!("SEGMENT_START: segment name is not valid UTF-8")


I'd probably be inclined to just keep the name as &[u8]. That way we don't need to worry about UTF-8 validation. When we're comparing it with things, we can just compare it with b"text" etc. That's how names are handled throughout most of the rest of the linker.

davidlattimore · 2026-04-20T06:33:08Z

+/// Evaluate a linker script expression that must be a compile-time constant.
+/// Returns `Err` for any expression that requires runtime context (symbols, location counter,
+/// etc.).
+fn eval_constant_expr(expr: &crate::linker_script::Expression<'_>) -> crate::error::Result<u64> {


I feel like this probably belongs in expression_eval.rs

davidlattimore · 2026-04-20T06:36:30Z

@@ -147,10 +249,7 @@ impl<'data> LayoutRulesBuilder<'data> {
                        .with_hidden(provide.hidden),
                );
            } else if let linker_script::Command::SymbolDefinition { name, value } = cmd {


If value is a string that needs parsing, then we should probably parse it when we parse the rest of the linker script. That way if there's an error while parsing, we can report the line number in the linker script where the error occurred.

davidlattimore · 2026-04-20T06:39:09Z

@@ -0,0 +1,8 @@
+__executable_start = SEGMENT_START("text", 0);


It could be good to define a symbol for each of the supported segment types, then have some code (in _start) that checks that those values appear correct relative to variables in the respective segments.

Blazearth · 2026-04-20T18:20:01Z

//#DiffIgnore:segment.LOAD.RWX.alignment
//#DiffIgnore:segment.LOAD.RX.alignment

Added these in the test since ld produces a single RWE LOAD segment when no PHDRS directive is present in the linker script. Wild always uses separate RO/RX/RW segments. Took me while to resolve the test.
I have addressed all the issues.

davidlattimore · 2026-04-20T22:06:09Z

+        // rodata lives in a dedicated RO segment when one exists, but in a typical Linux
+        // executable it shares the RX segment with .text. Match any non-writable loadable
+        // segment so we find whichever one actually contains read-only data.


This doesn't seem right to me. Maybe executables on Linux used to put RO data into the RX segment, but modern executables don't do this since it would make the binary slightly less secure. But in any case, what's important is what Wild does, which is emit a separate read-only non-executable segment.

Sure i will revert it back.

davidlattimore · 2026-04-20T22:07:41Z

+fn segment_name_matches(name: &[u8], def: impl crate::platform::ProgramSegmentDef) -> bool {
+    match name {
+        b"text" => def.is_executable() && !def.is_writable(),
+        b"data" | b"bss" => def.is_writable() && !def.is_executable(),


What does GNU ld return when you request the start of BSS? Does it return the start of the combined data/bss segment or does it return where bss starts within that segment?

Testing with GNU ld, SEGMENT_START("bss", ...) returns the start of the loadable writable segment, not the start of the .bss section within that segment, since .data and .bss are typically placed in the same LOAD segment.

davidlattimore · 2026-04-20T22:16:45Z

+  runtime_init();
+
+  /* text_start must be <= _start (both in the text segment) */
+  if (ptr_to_int(&text_start) > ptr_to_int(&_start)) {


Given that the linker script explicitly says to put .text at 0x600000, do you think we should confirm that it's at that address?

davidlattimore · 2026-04-20T22:54:31Z

+}
+
+fn is_known_segment_name(name: &[u8]) -> bool {
+    matches!(name, b"text" | b"data" | b"bss" | b"rodata")


It seems a shame to duplicate the list here and in the function above. What about adding an enum SegmentMatcher, then parsing that from the string. Bonus if we can parse when we parse the linker script so that that an unknown value gets reported with the correct line number.

I guess that might result in an unknown segment name being an error rather than a warning. That's possibly not a bad thing though.

davidlattimore · 2026-04-21T00:08:32Z

@@ -0,0 +1,16 @@
+ENTRY(_start)
+
+text_start = SEGMENT_START("text", 0x600000);


The default value here is identical to the . = 0x600000 used for the .text section below. That makes it hard to determine which was actually used. I tried replacing the default values for all four symbols and it appears that GNU ld ended up using those default values for all of them - e.g. it didn't actually use the start of the text segment at all. It could be good to do some experimentation to determine when the default value gets used and when it doesn't then try to match that.

Possibly you forgot this comment? It looks like it's still the case that the linker script starts .text at the same address as the default value. I'd suggest Changing the default values for all four SEGMENT_START calls to distinct values - e.g. 0x10, 0x11, 0x12, 0x13.

Sorry I forgot about that. I hope its all fine now.

davidlattimore · 2026-04-21T00:15:17Z

+#include <stddef.h>
+
+#include "../common/runtime.h"
+#include "ptr_black_box.h"


Could you change this to "../common/ptr_black_box.h"? I'll be doing a PR soon to update the includes in the other files and also to change the test runner to not have "../common" in the include path.

Blazearth · 2026-04-21T16:12:22Z

The default value here is identical to the . = 0x600000 used for the .text section below. That makes it hard to determine which was actually used. I tried replacing the default values for all four symbols and it appears that GNU ld ended up using those default values for all of them - e.g. it didn't actually use the start of the text segment at all. It could be good to do some experimentation to determine when the default value gets used and when it doesn't then try to match that.

Hey @davidlattimore , I ran a series of experiments to understand the real behavior before deciding how Wild should match it.

What the docs say

The official GNU ld docs state:

SEGMENT_START(segment, default) — If an explicit value has already been given for this segment with a command-line -T option then that value will be returned, otherwise the value will be default.

Case 1 — no -T, default differs from actual layout:

text_start = SEGMENT_START("text", 0x1);
. = 0x600000;  /* .text actually lands here */
Result: text_start = 0x1 (the default, not the actual segment address)

Case 2 — -Ttext=0x700000 override:

text_start = SEGMENT_START("text", 0x1);
. = 0x600000;
ld -Ttext=0x700000 ...
Result: text_start = 0x700000 (the CLI override wins, layout moves too)

Case 3 — no -T, default happens to match layout (our current test):

text_start = SEGMENT_START("text", 0x600000);
. = 0x600000;
Result: text_start = 0x600000 — ambiguous, can't tell if it used the default or the segment

Case 4 — no -T, default is different from layout:

text_start = SEGMENT_START("text", 0x500000);
. = 0x600000;
Result: text_start = 0x500000 — default again, not the actual segment base

In these cases, GNU ld appears to return the -T* override if provided, otherwise the default value — even when the actual segment lands elsewhere in the layout. This suggests SEGMENT_START does not derive its value from scanning PT_LOAD segments, at least in these scenarios.

What Wild currently does

Wild scans PT_LOAD segments by flag combinations (RX for text, RO for rodata, RW for data/bss) and returns the actual segment base address. This is more "layout-aware" but fundamentally different from GNU ld's behavior.

Now ,there are two options:

Option A — Match GNU ld strictly: SEGMENT_START only returns a value when the corresponding -T flag was passed on the command line (e.g. -Ttext, -Tdata, -Tbss). Otherwise return the default. Wild would need to store -T overrides and look them up here.

Option B — Keep Wild's current behavior: scan actual PT_LOAD segments by flags and return their base address. More useful in practice, but diverges from GNU ld when no -T is passed.

davidlattimore · 2026-04-21T21:45:40Z

I think for things like this, we want to match GNU ld or lld's behaviour, especially if both GNU ld and lld have the same behaviour.

Blazearth · 2026-04-25T17:51:59Z

I have changed it to match the behaviour of GNU ld where it overrides the address with -T for bss, data and text. As for rodata it always returns the default.
The test now has two configs to verify both paths: one without any -T flags (all four symbols return their linker script defaults), and one with -Ttext/-Tdata/-Tbss overrides (verifying the override values are returned, while rodata still returns its default).

davidlattimore

I'm a bit worried that it appears this implements parsing of -Ttext= etc and hooks it up to SEGMENT_START, but doesn't actually affect the actual start of the segments. That would mean that someone linking with -Ttext= would now not get an error, but would get a binary where the text segment didn't start where they said it needed to.

Ideally, we'd fully support -Ttext= etc before we'd support SEGMENT_START. In that regard, the best thing to do would be to separate -Ttext= support out into a separate PR, hook it up to layout so that it affects the address of the text segment and make sure that part of it is tested, submit that PR, then proceed with the rest of this PR.

But I also don't want to make you drag this PR out longer than you want to, so I'm open to other options. e.g. we could split the other way. i.e. move the parsing of -Ttext= etc to a later PR, including the part of the test you've got that ensures that works with SEGMENT_START. That would mean that this PR would support SEGMENT_START, but only where it returns the default value. The later PR could add support for -Ttext= etc, making it both affect the actual segment starts and the return value from SEGMENT_START.

davidlattimore · 2026-04-26T10:31:18Z

+            // -Ttext=ADDR, -Tdata=ADDR, -Tbss=ADDR are segment start overrides,
+            // not linker script paths. Handle them here since they share the -T prefix.
+            // The prefix handler gives us the part after "-T", which may be:


Thanks for figuring this out. I hadn't realised this particular oddity in GNU linker parsing semantics.

Blazearth · 2026-04-26T11:55:55Z

I'm a bit worried that it appears this implements parsing of -Ttext= etc and hooks it up to SEGMENT_START, but doesn't actually affect the actual start of the segments. That would mean that someone linking with -Ttext= would now not get an error, but would get a binary where the text segment didn't start where they said it needed to.

First of all, sorry for overlooking that — you're right that -Ttext= currently only affects the SEGMENT_START return value and doesn't actually move the segment in the output binary.

Ideally, we'd fully support -Ttext= etc before we'd support SEGMENT_START. In that regard, the best thing to do would be to separate -Ttext= support out into a separate PR, hook it up to layout so that it affects the address of the text segment and make sure that part of it is tested, submit that PR, then proceed with the rest of this PR.

I'd like to try and go with this option first — implement proper -Ttext/-Tdata/-Tbss support that actually affects the segment layout, then this PR's with-T-overrides test will be fully correct. I'll open a separate PR for that.

If I find it's too complex to get right, I'll fall back to Option B — strip the -Ttext parts from this PR so it only supports SEGMENT_START returning the default value, and leave the override support for a later PR.

…ll symbol tested.

Blazearth · 2026-05-01T23:33:02Z

Now SEGMENT_START move the section at the correct place in output binary with -Ttext override from #1877

…Ignore for aarch64

davidlattimore

Thanks for implementing this :)

Blazearth force-pushed the feat-segment-start-support branch 2 times, most recently from 0b22961 to edc82d7 Compare April 18, 2026 00:37

lapla-cogito reviewed Apr 19, 2026

View reviewed changes

Blazearth force-pushed the feat-segment-start-support branch from f1f9c31 to 9ac2117 Compare April 19, 2026 11:01

lapla-cogito previously approved these changes Apr 20, 2026

View reviewed changes

davidlattimore reviewed Apr 20, 2026

View reviewed changes

Blazearth marked this pull request as draft April 20, 2026 10:55

Blazearth force-pushed the feat-segment-start-support branch 2 times, most recently from bd7efe7 to b143984 Compare April 20, 2026 17:58

Blazearth marked this pull request as ready for review April 20, 2026 18:20

davidlattimore reviewed Apr 21, 2026

View reviewed changes

Blazearth marked this pull request as draft April 21, 2026 13:57

Blazearth force-pushed the feat-segment-start-support branch from ce9d62d to f263564 Compare April 25, 2026 17:40

Blazearth marked this pull request as ready for review April 25, 2026 17:52

davidlattimore reviewed Apr 26, 2026

View reviewed changes

Blazearth marked this pull request as draft April 26, 2026 11:56

Blazearth mentioned this pull request Apr 30, 2026

feat: add -Ttext/-Tdata/-Tbss segment layout for SEGMENT_START support #1877

Merged

Blazearth added 4 commits May 1, 2026 20:50

linker-scripts: support SEGMENT_START builtin function

ef226a4

Changed to single parser and added warning for unknown names

8853172

Refactored the code and changed the test to regular executable with a…

58edaa7

…ll symbol tested.

refactor: introduce SegmentName enum for SEGMENT_START

934dfc5

Blazearth force-pushed the feat-segment-start-support branch 2 times, most recently from cbd41b4 to acbfb3c Compare May 1, 2026 23:00

Blazearth marked this pull request as ready for review May 1, 2026 23:20

fix: accept unknown SEGMENT_START names matching GNU ld; restore Diff…

ffe5a62

…Ignore for aarch64

Blazearth force-pushed the feat-segment-start-support branch from acbfb3c to ffe5a62 Compare May 2, 2026 00:21

davidlattimore approved these changes May 2, 2026

View reviewed changes

davidlattimore merged commit 70600b4 into wild-linker:main May 2, 2026
38 of 48 checks passed

		@@ -0,0 +1,8 @@
		__executable_start = SEGMENT_START("text", 0);

		@@ -0,0 +1,16 @@
		ENTRY(_start)

		text_start = SEGMENT_START("text", 0x600000);

Uh oh!

Conversation

Blazearth commented Apr 18, 2026

Uh oh!

lapla-cogito left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lapla-cogito left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Blazearth commented Apr 20, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Blazearth commented Apr 21, 2026

Uh oh!

davidlattimore commented Apr 21, 2026

Uh oh!

Blazearth commented Apr 25, 2026

Uh oh!

davidlattimore left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Blazearth commented Apr 26, 2026

Uh oh!

Blazearth commented May 1, 2026

Uh oh!

davidlattimore left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

lapla-cogito left a comment •

edited

Loading