Phase 4: §2 multi-client socket daemon + §6 ARM64e xref coverage extension + CI fix#21
Merged
Conversation
… item 5) Move MovSrcKind + classify_mov_source from lldb_backend.cpp's anonymous namespace to xref_arm64_parsers so unit tests can pin the alias-name- first match order without a live LLDB target. The prior implementation worked by accident — `lr` / `xzr` / `wzr` happened to land in the right switch arm via fall-through, but a future refactor that touched the prefix-check could silently regress. Phase 4 item 5 from docs/35-field-report-followups.md §3: token-compare against the alias spellings BEFORE any prefix heuristic. New unit tests pin classify_mov_source's behaviour for the zero (xzr/wzr/#0), stack pointer (sp/wsp), link register (lr), xN/wN width-distinguishing, and malformed-input arms. No behaviour change against existing fixtures — the lifted function is byte-identical to the previous in-place implementation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase-1 socket mode points a single NotificationSink at the dispatcher on accept() and clears it on disconnect. That is race-free only because phase 1 is strictly one connection at a time — no other sink is alive to receive a notification belonging to a different connection. Phase 2 needs to accept multiple concurrent connections, which breaks the single-sink design: connection A's stop event would either route to connection B's OutputChannel (after B's accept re-pointed the sink), or vanish (after A's disconnect cleared it but before B's accept). Either outcome corrupts the JSON-RPC stream that every client sees. NonStopRuntime now owns a subscriber SET, guarded by `sinks_mu_`. Each connection that wants notifications calls `add_notification_sink` on accept and `remove_notification_sink` on disconnect. emit_stopped_ snapshots the subscriber list under a shared lock, drops the lock, then fans the notification out — so a slow sink (one whose OutputChannel's mutex is contended) doesn't stall the other subscribers' deliveries. `set_notification_sink(sink)` is kept as a back-compat shim with new "replace the entire subscriber set with this one" semantics. Stdio mode (main.cpp) still calls it once at startup and gets the same behaviour as before. Phase-2 socket_loop.cpp migrates to add/remove so multiple connections coexist without disturbing one another. The runtime's single emit funnel point (set_stopped → emit_stopped_) is the only call site for thread.event notifications in the daemon today; the NonStopListener forwards parsed RSP stop replies through runtime.set_stopped, and probe / breakpoint events use no separate emission path. The subscriber set therefore covers every async notification the dispatcher fires. Tests: - New unit cases in `tests/unit/test_nonstop_runtime.cpp` pin the fan-out, the remove behaviour, and the set/clear back-compat semantics. All four failed-as-expected before the implementation and pass after. - The existing `set_notification_sink` callers in test_nonstop_listener and test_dispatcher_nonstop still work — the new "replace all" semantics match what those tests assume (one sink, no others). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 served one connection at a time: accept() → serve_one_connection in the calling thread → close → next accept. An agent script wanting to fire two `ldb` invocations in parallel against the same daemon had to serialise them externally, or each call paid the spawn cost. This commit accepts a connection, spawns a std::thread per connection that owns its fd for its entire lifetime, and the main thread goes straight back to accept(). The Dispatcher is shared; concurrent RPC service is serialised through its new `dispatch_mu_` outer lock. Concurrency audit (recorded for the next reviewer): - `LldbBackend::Impl::mu` already guards every public method's SBAPI access. Every public LldbBackend method acquires it; nothing changed in this commit. The phase-3 chained-fixups branch's drop-mu-during-file-IO pattern still holds. - `ProbeOrchestrator` has its own `mu_`. Every public method takes it; callback paths re-acquire when re-entering the orchestrator. - `SessionStore` and `ArtifactStore` each have their own internal mutex around sqlite access (single-writer assumption preserved by WAL). - `NonStopRuntime` has its own per-instance shared_mutex (state map) and the subscriber set lock added in the prereq commit. - `Dispatcher`'s OWN mutable state — target_main_module_, diff_cache_ + diff_cache_index_, cost_samples_, python_unwinders_, rsp_channels_, active_session_writer_, active_session_id_ — was NOT thread-safe. `dispatch_mu_` covers all of it under one outer lock for the duration of every dispatch() call. Strategy: serialise via dispatch_mu_ around the entire dispatch lifetime. Correct, dumb, and low-throughput in the multi-client case (one RPC at a time across all connections). Per-target sharding is the natural phase-3 refinement; the dispatcher's mutable state would have to migrate to a per-target map first. Documented in `dispatcher.h`. Shutdown sequence: signal handler sets g_shutdown; accept() returns EINTR; the main loop notices the flag and exits the accept loop. On the way out we join every outstanding worker thread. In-flight RPCs run to completion (LldbBackend's SBAPI calls aren't interruptible from outside); a separate item in §2 phase-2 plans a self-pipe + poll() refinement for finer-grained cancellation. Tests: - New `tests/smoke/test_socket_multiclient.py`: two Python threads each open a socket, run `target.open` (with its module list as a side effect — see handle_target_open), sync on a barrier, then run `module.list`. The barrier times out at 10s; phase-1 serial service would deadlock there because the second connection's accept() blocks until the first disconnects. - Failed against the pre-fix daemon (barrier timeout, observed in the RED ctest run). Passes after the thread-per-connection refactor. - All existing socket tests (lifecycle, collision, perms) still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…se 4 item 1) Phase 3 resets adrp_regs on RET / unconditional B / BR only. Conditional branches (b.cond / cbz / cbnz / tbz / tbnz) whose target sits in a different function are tail-call-like handoffs; on the symbolized side, gate 1's function_name_at check catches the leak when the scanner steps into the target function, but on the stripped side gate 1 silently misses it (both adjacent functions return "" from function_name_at). Implement option (b) from docs/35-field-report-followups.md §3 phase 4: parse the conditional's target operand inline (LLDB renders it as `0xNNNNNNN`), resolve to a function name, and reset adrp_regs when that name differs from the current function. Skip the parse when adrp_regs is empty (the function_name_at call dominates cost; mirrors gate 1's same optimisation). Bump a new provenance.adrp_pair_cond_branch_reset counter so callers can see when the heuristic conservatively dropped tracking — in stripped binaries this is the only signal. Provenance schema additions (forward-compatible): - adrp_pair_cond_branch_reset (item 1) - adrp_pair_function_start_reset (item 3 — wired in a subsequent commit) - adrp_pair_unresolvable_load (item 4 — wired in a subsequent commit) The two not-yet-populated counters are exposed on the wire now so the dispatcher's serialisation path doesn't need a second pass when later commits populate them. TDD: tests/fixtures/asm/xref_condbranch.s + test_xref_condbranch.py. The fixture is symbolized so gate 1 also covers the leak, but the test pins provenance.adrp_pair_cond_branch_reset > 0 to prove the new path fired — a future refactor that silently deletes the path would flip the assertion red. ctest: 10/10 xref smoke tests pass. No regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e 4 item 2)
Phase 3's FAT picker preferred arm64e > arm64 unconditionally. When
LLDB loaded the arm64 slice of a FAT binary that ALSO had an arm64e
slice, the picker still returned the arm64e map — different
image_base, zero matches in xref_address.
Phase 4 item 2 closes the loop: extract_chained_fixups_from_macho()
gains an optional std::string_view triple parameter. The dispatcher
calls SBTarget::GetTriple() and passes it through; the FAT picker
classifies the triple ("arm64e-" / "arm64-" / "x86_64-") into the
preferred (cpu_type, cpu_subtype) pair and tries the matching slice
first. Falls back to the phase-3 preference order when:
- triple is empty (existing callers haven't been migrated yet)
- triple names an unknown arch
- the matching slice exists but has no chained fixups
This keeps the existing behaviour for any caller that doesn't yet
plumb the triple through; new callers see exact-match selection.
ARM64_ALL (subtype 0) match also accepts ARM64_V8 (subtype 1) — the
LLDB triple "arm64-" can map to either subtype depending on the slice
the linker tagged. Skip when the triple demanded arm64e (V8 is not
arm64e).
TDD: 4 new unit tests under [chained_fixups][macho][fat][triple] in
tests/unit/test_chained_fixups.cpp pin: arm64 triple picks arm64
slice (image_base proves it), arm64e triple picks arm64e slice,
empty triple falls back to phase-3 default, missing-matching-slice
falls back too. 15/15 [chained_fixups] tests pass; 10/10 xref smoke
tests still green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase-1 expected the operator to run `ldbd --listen unix:PATH` once
manually before issuing any `ldb --socket PATH` invocations; a stale
or missing daemon surfaced as a bare "could not connect" error. For
shell scripts that want the persistent-state property without the
ceremony of managing the daemon lifecycle by hand, the obvious
ergonomic ask is "just start one if it isn't running."
`_SocketProc` now detects the ECONNREFUSED / ENOENT / ENXIO subset
of connect() failures, fork+execs `ldbd --listen unix:PATH` with
`start_new_session=True` (setsid), waits up to ~3s for the socket
to start accepting, and retries the connect. The auto-spawned
daemon outlives the client process so the next CLI invocation
reuses it without re-spawning.
The ldbd binary is resolved through a three-step search:
1. $LDB_LDBD_SPAWN — explicit override; tests use this to pin the
build's ldbd binary without depending on $PATH discovery.
2. shutil.which("ldbd") — global install.
3. _find_ldbd_sibling() — the in-tree heuristic that the §1
sibling-lookup commit established for `--ldbd`.
stdin/stdout/stderr are ALL redirected to /dev/null in the
daemon. The earlier sketch (which inherited the client's stderr
to preserve diagnostics) caused a subtle test-runner hang: when
a caller wrapped `ldb --socket ...` with subprocess.run
capture_output=True, the daemon inherited the captured stderr
pipe and held it open across the client's exit — the wrapper
never saw EOF and blocked indefinitely. Operators who want the
diagnostics now set $LDB_LDBD_LOG_FILE; the spawn redirects
stderr to that path instead.
Help text updated to document the auto-spawn flow.
Tests:
- New `tests/smoke/test_socket_autospawn.py`:
* Picks a fresh tempdir socket path; no daemon running.
* Invokes `ldb --socket $path target.open ...`. Asserts rc=0
and a valid target_id.
* Invokes a second `ldb --socket $path module.list
target_id=$N`. Asserts rc=0 — proves the daemon persisted.
* Kills the daemon by pid recovered from $sock.lock; asserts
socket inode unlinked.
- Failed RED before the implementation (the daemon never
spawned; the test's `expect(rc == 0)` tripped immediately).
Passes after.
- The four existing socket tests still pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… 4 item 3) Phase 3's gate 1 uses function_name_at() to detect function boundaries. On a stripped Mach-O without LC_SYMTAB local symbols, function_name_at would return "" for adjacent functions and gate 1 silently treats them as one — adrp_regs leaks across. (On macOS / Apple-silicon, LLDB synthesises ___lldb_unnamed_symbol_<addr> per-address names so gate 1 still works; the leak fires on platforms where LLDB doesn't synthesise OR when the bytes between two functions look like raw code with no function-context lookup hit. Real WeChat-class iOS binaries have hit this pattern in the field.) Phase 4 item 3 records every B / BL / conditional-branch target inside the current code section as a function-start hint. The check fires BEFORE gate 1: when the scanner reaches an instruction whose address is in the function_starts set, adrp_regs is reset and the new provenance.adrp_pair_function_start_reset counter bumps. The two paths are complementary — either is sufficient, the union is the discriminating signal. Lift the hex-token parser used by the cbz-target check (item 1) into a shared lambda parse_last_hex_in_operands so both paths use the same logic. Single-pass / forward-only: a branch at file_addr X to target Y only takes effect for Y > X (the common case in compiler-emitted code; backward-only-reached functions still miss). TDD fixture: tests/fixtures/asm/xref_stripped_fnleak.s — two adjacent non-globl functions linked through `bl`, with `strip -x` applied post-link to remove the local function symbols. x19 (callee-saved per AAPCS64) holds an ADRP page across the BL so phase 3's caller-saved clear can't mask the leak. The smoke test asserts zero false-positive matches; documents that on macOS gate 1's synthesised names also cover the boundary, so the test doesn't strictly require the function_start_reset path to fire (correctness is what matters). Bumps the worktree's smoke-test count from 82 to 83. ctest 100% green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two phase-2 items that share their plumbing:
4. SIGTERM mid-accept must wake the listener within milliseconds.
Phase-1 polled g_shutdown only between connections; a daemon
idle in accept() saw EINTR on signal and exited, but only by
accident — bare accept() returned EINTR and the loop's flag
check fired on the next iteration. Adding poll() with
non-blocking accept() makes that explicit and gives us a
second wakeable fd for #6 below.
6. `daemon.shutdown` RPC: a connected client can ask the daemon
to exit cleanly. The handler returns `{ok:true}` and triggers
the same wake mechanism that SIGTERM uses, so an orchestrator
can drain the daemon without spawning a "kill by pid" step.
The shared mechanism is a self-pipe. Both ends are CLOEXEC and
non-blocking. The signal handler writes a byte (write(2) is
async-signal-safe per POSIX); the daemon.shutdown callback writes
the same byte from the worker thread. The main accept loop's
poll() monitors srv + pipe[0]; on POLLIN of pipe[0] it drains the
pipe (non-blocking read, so the drain terminates with EAGAIN once
empty — the prior blocking-read attempt deadlocked here, only
discovered by tracing the daemon.shutdown test failure) and
checks g_shutdown.
Bug found while writing this: the read-end of the self-pipe must
also be O_NONBLOCK, not just the write end. The drain loop reads
in a loop until read() returns ≤ 0; with a blocking read end, the
SECOND iteration (pipe empty after consuming the wake byte)
blocks forever. The non-blocking flag makes it return EAGAIN
instead.
Scope clarification (per docs §2 "in-flight RPC interruption"):
this commit only stops accepting new RPCs immediately and lets
the currently-executing dispatch run to completion. Cancelling an
in-flight LldbBackend SBAPI call from outside is genuinely
impossible against the LLDB ABI; the test
`test_socket_interruption.py` documents that scope by closing the
client socket so the worker sees EOF cleanly. The shutdown
callback is wired only in listen mode; stdio mode's
`daemon.shutdown` returns -32002 with a "use stdin EOF or SIGTERM"
message.
describe.endpoints catalog grew one entry for `daemon.shutdown`.
Schema is trivial (no params; returns `{ok: bool}`).
Tests:
- New `tests/smoke/test_daemon_shutdown_rpc.py`: connects, sends
daemon.shutdown, verifies ok=true reply, closes client, asserts
daemon exits within 10s with rc=0 and the socket/lockfile gone.
- New `tests/smoke/test_socket_interruption.py`: connects,
completes one describe.endpoints call, sends SIGTERM to the
daemon, closes the client, asserts daemon exits within 5s with
rc=0. Pre-fix daemon hung in the accept loop until the signal
arrived AND a new connection event happened (or the bare
accept's EINTR fired) — the poll-based path makes it
deterministic.
- All five prior socket tests still pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3's gate 7 bumps adrp_pair_skipped for register-offset LDRs with a tracked base (`[xN, xM]` / `[xN, xM, lsl #imm]`). Phase 4 item 4 extends the family: PC-relative literal loads (`ldr xN, #imm` / `ldr xN, 0xNNNN`) bypass the ADRP+pair pattern entirely — they load the slot's value via PC-relative addressing, not through a register the scanner tracked. The literal-pool slot might hold a pointer to a string or constant in __TEXT/__cstring or __DATA_CONST. The scanner can't statically dereference it (would need to re-read the segment data at file_addr + pcrel_imm). Phase 4 bumps the new adrp_pair_unresolvable_load counter so callers see this happened, instead of the load silently disappearing. Detection shape: in the "memop didn't match resolve_adrp_consumer" fallback, after the existing `[xN, ...]` register-offset branch, check for an immediate-shaped operand (`#imm` / `0xNNN` / `-imm`). Only `ldr` / `ldrsw` produce literal-pool loads on arm64 — stores and short loads use different addressing modes. The new counter (and the matching adrp_pair_function_start_reset for item 3) is exposed on the wire by the dispatcher path that already serialises the other adrp_pair_* fields. TDD: tests/fixtures/asm/xref_pcrel_literal.s — `ldr x0, _pcrel_const` where _pcrel_const is a quad inside __TEXT/__text. The smoke test asserts provenance.adrp_pair_unresolvable_load >= 1. xref.addr against `_pcrel_data` returns 0 matches today (the heuristic gives up on the literal); the counter is the contract that surfaces this to the caller. 12/12 xref smoke tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… item 6)
The phase-4 spec for bind resolution (docs/35-field-report-followups.md
§3 item 6) allows shipping only the schema if the imports-table walk
becomes too complex for one branch. The parse / walk itself spans:
- dyld_chained_fixups_header::imports_offset / imports_count /
imports_format (three formats: DYLD_CHAINED_IMPORT,
_IMPORT_ADDEND, _IMPORT_ADDEND64)
- Indexing into the imports table by the bind's ordinal field
(24-bit or wider depending on format)
- String-table lookup via name_offset into the symbols region
- Optional SBTarget::FindSymbols(name) for resolved_addr when a
process is loaded
That's ~150 LOC of byte-level parsing across three import formats. To
keep this branch tight, ship only the schema additions:
- new BindInfo struct: name, addend, ordinal, resolved_addr (opt).
- new ChainedFixupMap::binds map: rva → BindInfo, populated by the
phase-5 walk; today's parser leaves it empty for every fixture.
Three new unit tests pin the schema:
- BindInfo default-constructible with empty fields
- ChainedFixupMap.binds empty by default
- parse_chained_fixups leaves binds empty on a rebase-only payload
The phase-5 commit that wires the walk in will populate binds for
test vectors that carry imports_count > 0 and flip the third
assertion. Today's 18/18 [chained_fixups] tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
An orchestrator that auto-spawns the daemon (the §2 phase-2 client- side auto-spawn lands an `ldbd --listen unix:PATH` if no daemon is running) probably wants that daemon to die quietly after the burst of activity finishes. Otherwise every interactive session leaves a lingering ldbd, and the operator has to clean it up by hand. `--listen-idle-timeout N` gates the daemon's shutdown on the accept-loop's poll() returning 0 (timeout elapsed) AND a "no live workers" check. Both conditions are necessary: a long-lived agent session might idle on a connected socket for >N seconds while the user thinks; pulling the daemon down would surface as a mysterious disconnect. Implementation: - New static atomic `g_live_workers` tracks the count of running per-connection worker threads. The accept loop increments BEFORE std::thread construction (so a poll wake-up that races with this spawn can't observe zero workers); the worker decrements on exit. - `poll()` takes `idle_timeout_sec * 1000ms` as its timeout argument when `idle_timeout > 0 && live_workers == 0`, otherwise -1 (block forever). On `poll()` returning 0 the loop rechecks live_workers (catching the case where a worker emerged during the gap) and, if still zero, sets g_shutdown and breaks. The existing teardown path (close listener, unlink socket + lockfile, join workers) runs unchanged. - Workers write a wake byte to the self-pipe when they exit so the accept loop re-evaluates the timeout. Linux's poll resets the timeout per-call but macOS's preserves it across spurious returns; the explicit wake makes the behaviour uniform without depending on the platform's poll semantics. Tests: - New `tests/smoke/test_socket_idle_timeout.py`: starts `ldbd --listen-idle-timeout 2 --listen ...`, waits 8s with no clients, asserts the daemon exited rc=0 and the socket/lockfile are gone. Pre-fix daemon (no idle timeout) hangs in poll() forever; the test would time out at 30s. - All six prior socket tests still pass. `ldbd --help` text grew a paragraph documenting the flag. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ase 4 item 7)
Phase 4 item 7 (docs/35-field-report-followups.md §3) asks for a
moderate-size C-compiled fixture that exercises the resolver in
shapes closer to real iOS app binaries than the hand-assembled
phase-3 fixtures. Add tests/fixtures/c/real_world_xref.c:
1. static const char *const k_string_table[3]: selref-style
ADRP+LDR through a __DATA_CONST chained-fixup slot.
2. Multiple functions in one TU exercising function-boundary
reset (RET-clear + name-based + function_starts).
3. Conditional-branch tail-call (`if (which == 0) return
real_xref_pick(0); return k_string_table[which];`) — proves
phase 4 item 1's cross-function reset doesn't eat the
legitimate same-function fall-through xref.
4. extern malloc / free imports — exercises the chained-fixup
binds path (BindInfo schema; resolution is phase 5).
Build: -arch arm64 -O1 -Wl,-fixup_chains so the linker emits
LC_DYLD_CHAINED_FIXUPS with __DATA_CONST rebases for the string
table. Apple-silicon-arm64 only.
Smoke test asserts:
- Every entry in k_string_table[] surfaces at least one xref
instruction via string.xref (slot-indirection path live).
- A non-pointer literal (0x1122334455667788) surfaces zero
matches (false-positive density on a 4-function TU is the
noise-floor metric).
Spot-check against /usr/bin/uname (host-dependent, not automated):
triple = arm64e-apple-macosx26.3.0; FAT slice picker (item 2)
selected arm64e correctly. 8 sampled strings each returned 1
xref with empty provenance — no skips, no warnings, no false
positives. Documented as a manual probe; not a CI assertion
because the binary changes across macOS versions.
ctest: 84/84 (was 83) all green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move docs/35-field-report-followups.md §3 "Phase 4 — carried forward" subsection into "Phase 4 — what shipped" with commit SHAs and acceptance evidence for each item. New "Phase 5 — carried forward" subsection captures the items still deferred (full bind walk, auth- rebase key-class filtering, on-disk cache, correlate.* wire-up, multi-module xref, full dataflow, CI assertions on real iOS binaries). Worklog entry pins the seven phase-4 commits, the decisions behind option (b) for conditional-branch handling, the schema-only ship for bind resolution, and the manual /usr/bin/uname spot-check that replaced the spec's /usr/bin/grep suggestion (grep's __cstring is empty — strings come from the shared cache, not the binary). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
session.replay's per-row loop calls dispatch() re-entrantly so the replayed request goes through the full outer wrapper — provenance decoration + per-RPC cost recording still fire, while the session- log append no-ops because replay suspends the writer. The multi- client commit's std::mutex deadlocked there; the ctest smoke_session_replay run pinned this within seconds. std::recursive_mutex restores correctness without losing the cross-thread serialisation property. Same-thread re-entry is now free; cross-thread overlap still queues at the lock. The overhead per-acquisition vs std::mutex is negligible compared to the work inside any real RPC. Also folds in: - `docs/35-field-report-followups.md §2`: "Phase 2 — what shipped" subsection records the six items that landed (multi-subscriber sinks, multi-client listener, auto-spawn, signal-driven wakeup, daemon.shutdown, idle timeout) with the concurrency audit notes. "Phase 3 — carried forward" enumerates the deferred items (token auth, per-target dispatcher sharding, true in-flight cancellation, worker reaping mid-flight, TLS, single-client RPC multiplexing). - `docs/WORKLOG.md`: new dated entry summarising the goals, per-commit deliverables, key decisions, surprises (the capture_output / stderr-inheritance hang; both-ends-non-blocking for the self-pipe; phase4-xref-improvements worktree contamination), and verification. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…AF fix) The phase-2 NonStopRuntime stored raw NotificationSink* in its subscriber vector. emit_stopped_ snapshotted the raw pointers under a shared lock, dropped the lock, then dereferenced — a concurrent remove_notification_sink (on a connection-worker thread) racing with sink destruction (the worker's stack-local StreamNotificationSink going out of scope on disconnect) could free the sink while the listener thread still held the raw pointer in its snapshot. Reviewer reproduced it with TSan (vptr race) and ASan (heap-use-after-free) on a focused multi-threaded unit test. Fix: migrate subscriber storage from `NotificationSink*` to `std::shared_ptr<NotificationSink>`. emit_stopped_'s snapshot now copies shared_ptrs, bumping refcounts; every sink in the snapshot stays alive across the iteration regardless of concurrent remove. On the connection-worker side, the per-connection StreamNotificationSink is allocated via std::make_shared so the runtime's strong ref and any in-flight emit's snapshot ref both keep it alive past the worker's return. remove_notification_sink and set_notification_sink move the doomed sinks out of the vector under the lock and drop them AFTER releasing, so a sink destructor that might re-enter the runtime can't deadlock on sinks_mu_. Test: tests/unit/test_nonstop_runtime.cpp adds a 200ms-budgeted concurrent stress test (emitter thread vs add/remove churn thread) and a synchronous "runtime keeps sink alive across emit even if caller drops its ref" test using weak_ptr observation. Both pass TSan (`-fsanitize=thread`, sibling `build-tsan/` dir). Updated existing call sites: main.cpp (stdio sink → make_shared), socket_loop.cpp (per-connection sink → make_shared), test_nonstop_*.cpp (local sinks → shared_ptr). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-fix: after `daemon.shutdown` or SIGTERM set `g_shutdown`, the
accept loop stopped accepting NEW connections — but already-connected
workers kept reading + dispatching RPCs as long as the peer kept
sending. The phase-2 doc claims "shutdown stops accepting new RPCs
immediately"; reality was broader, and the daemon process would
linger long after the accept loop had exited because workers were
still dispatching.
Fix: `serve_one_connection` takes an optional `is_shutdown` predicate.
Between read and dispatch, if the predicate returns true, the worker
synthesises a kBadState ("daemon shutting down") response — echoing
the request id for correlation — and breaks out of the loop. The
worker returns, the accept-loop join unblocks, the daemon exits.
Stdio mode keeps the default (empty predicate evaluates as false) so
its single-client semantics are unchanged. The socket loop passes a
closure over the file-scope `g_shutdown` atomic.
Test: `tests/smoke/test_socket_shutdown_active_clients.py` exercises
the cross-cutting promise — two clients A and B; B sends
daemon.shutdown; A's next RPC must surface a shutdown error (or clean
EOF), NOT a normal success response; daemon exits within a generous
window. Without the fix the test fails because A's hello is dispatched
successfully past the shutdown latch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A connected-but-not-reading peer let the kernel send buffer fill; the daemon's `::write(2)` in `FdStreambuf::sync` then blocked indefinitely. The listener thread serving notifications calls the same write path through `OutputChannel`, so an indefinitely-blocked write held the inner backend's `map_mu_` shared. A second client's `target.close` then wants `map_mu_` UNIQUE while holding `dispatch_mu_` — the whole daemon wedges accepting new connections but unable to service any RPC behind the dead-peer write. Fix: mirror the existing `SO_RCVTIMEO` setsockopt block. 60 seconds is far past any benign reply round-trip but tight enough that a wedge doesn't keep the daemon unresponsive for minutes. On EAGAIN the streambuf latches `write_failed_`, `write_response` throws `protocol::Error`, and the worker exits cleanly via the existing error-handling path. Test: `tests/smoke/test_socket_slow_reader.py` — client A connects with a small SO_RCVBUF, fires a stream of `describe.endpoints` RPCs (~50KB reply each), never reads. Client B concurrently does a tiny hello and must get a response in well under 30s. After tearing down A, the daemon must exit on SIGTERM within 15s — pre-fix it could sit on the blocked write to A indefinitely. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase-4 item 1 (commit 311c439) introduced two silent-wrong-result regressions against phase 3. C1 (fall-through clobber): the implementation unconditionally cleared adrp_regs on a cross-function cond branch, including the source-side fall-through path. The spec literally reads "Fall-through path: preserve state". An `add x0, x8, _t@PAGEOFF` after `cbz x9, _other_fn` is in the source function by definition; clearing x8 silently lost the xref. C2 (same-fn target poisons function_starts): the cond-branch block also unconditionally inserted the target into function_starts. A same-function cbz to a local label (Lhere, loop backedges, basic-block merges) then triggered gate 3 to reset adrp_regs at the label, killing the post-label consumer's xref. Fix: rework the cond-branch block. - No more source-side adrp_regs.clear(). The fall-through stays tracked. - function_starts.insert() and the provenance bump fire only when the target's function differs from the current function. Same-fn targets no longer poison function_starts. - Counter renamed: adrp_pair_cond_branch_reset → _recorded (we record a target hint now, we don't reset state). - Move the cond-branch bookkeeping outside the `!adrp_regs.empty()` guard (I4): the function_start hint is valuable for LATER iterations once an ADRP becomes tracked, even if no ADRP is tracked at the cbz site. Updated dispatcher schema (I2 partial): the existing schema only declared two counters; bring it up to date with the five the code emits, with docstrings explaining each one's semantics. TDD evidence: two new fixtures + smokes (xref_cond_fallthrough.s, xref_cond_same_fn.s) failed RED against 2b170ce with the diagnostic "the legitimate xref against … vanished" and the matches list empty. Post-fix both pass; the existing xref_condbranch smoke (the cross-fn case) continues to pass with the renamed counter. ctest 87/87 (85 prior + 2 new). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
I4: when N clients race-spawn N daemons against the same socket path, the (N-1) losers all write diagnostic lines to the SAME stderr (often via LDB_LDBD_LOG_FILE redirection). Pre-fix each line was emitted as a chain of `std::cerr << "ldbd: ..." << pid << ... << "\n"` shifts; libstdc++ flushes each shift as its own write(2) syscall, and concurrent processes interleave the bytes mid-line. Operators saw "ldbd: another daemon is already lis ldbd: another daemon is alr". Fix: introduce `log_err_line(std::string)` which emits the line with a single `std::fwrite(..., stderr)`. POSIX guarantees a single write of ≤PIPE_BUF (typically 512) bytes to a regular file or pipe is atomic w.r.t. concurrent writers. Convert every multi-shift stderr line in this file to use it. Test (`tests/smoke/test_socket_autospawn_logs.py`): launch 10 daemons against the same socket path with stderr aimed at a single log file. Exactly one wins the bind race; the rest exit with a diagnostic. Verify every non-empty line in the log starts with `ldbd: ` — i.e. no diagnostic got torn across a write boundary. N3: `g_shutdown_pipe[1]` is read in the signal handler. While the unaligned-int read is harmless on aarch64 in practice, strict conformance requires an `std::atomic<int>` for the cross-thread publish/load. Introduce `g_shutdown_pipe_write` atomic, published under release-store AFTER FD_CLOEXEC + O_NONBLOCK are set, cleared to -1 BEFORE the close in teardown. A late signal arriving during shutdown now observes the sentinel and skips the write — pre-fix it could (rarely) write to a closed fd or, worse, a recycled fd of an unrelated open. N4: workers list grows for daemon lifetime. Reviewer flagged this as legitimately phase-3-deferable; add an explicit `TODO(phase 3 / N4)` comment next to the list declaration so a future maintainer doesn't rediscover it cold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The phase-3/4 ADRP-pair resolver maintained a WHITELIST of mnemonics in the post-emit state-mutation block: ADD/SUB/ADDS/SUBS clobbered the dst, MOV variants ran apply_mov_state, calls clobbered the AAPCS64 caller-saved set, returns cleared the map. Every OTHER register-writing instruction silently left dst tracking intact. That whitelist was the wrong invariant. CSEL / CSET / CSINC / CSINV / CSNEG, LDP / LDPSW / LDXP / LDAR / LDAXR, MADD / MSUB, EXTR / BFI / BFM / UBFX / SBFX / UBFM / SBFM, ORR / AND / EOR / EON with shifted-reg, FMOV (to GPR), SDIV / UDIV, REV / CLZ, ASR / LSL / LSR / ROR shifts — every one of them writes a destination register but none of them appeared in the whitelist. After any of them ran, the destination register kept whatever ADRP page it previously held and the next LDR or ADD through that register produced a silent false positive. C3 (CSEL): the "pick between two strings" compiler idiom emits adrp x8, _str_a@PAGE adrp x9, _str_b@PAGE ... csel x8, x9, x8, gt ldr x0, [x8, #0x10] xref.addr(_str_a + 0x10) falsely matched the LDR — this is the most common false-positive vector in real iOS / macOS binaries. C4 (LDP): a function entry's `ldp x8, x9, [sp]` (callee-saved reload) rewrites x8 from memory; any prior ADRP into x8 is gone. The phase-3 resolver didn't model paired loads at all, so the post-LDP ADD false-matched. Architectural shift: clobber-by-default. Introduce a new helper parse_destination_registers(mnemonic, operands) in xref_arm64_parsers that returns the canonical x-register names an instruction writes. The post-emit pass runs explicit propagation paths first (ADRP records, MOV propagates, calls clobber caller- saved, returns/B clears all), then the new pass erases every destination register that wasn't already handled by an explicit arm. dst_already_handled gates the second pass so legitimate ADRP/MOV tracking isn't undone. The helper handles 14 mnemonic categories: - Stores (STR/STP/STUR/STRH/STRB/STLR/STNP/...) — no dst. - Compares (CMP/CMN/TST/CCMP/CCMN/FCMP/...) — no dst. - Branches & returns (B/BL/BR/BLR/CBZ/TBZ/B.cond/...) — no dst. - System (NOP/YIELD/WFE/DMB/DSB/ISB/MSR/...) — no dst. - Paired loads (LDP/LDPSW/LDXP/LDAXP/LDNP) — two dsts. - Default: first operand register is the destination. The default catches CSEL/CSET/CSINC/CSINV/CSNEG/MADD/MSUB/ORR/ AND/EOR/EXTR/BFI/UBFX/etc. without enumeration. clobber_arith_destination is removed — ADD/SUB/ADDS/SUBS now fall through to the generic pass which produces identical behaviour. TDD evidence: two new fixtures + smokes (xref_csel.s, xref_ldp_clobber.s) failed RED against ced9f17 with the diagnostic "the LDR/ADD through stale x8 matched against …". Post-fix both pass; 16 new unit test cases pin parse_destination_registers behaviour across CSEL, LDP, LDPSW, LDXP/LDAXP, LDR family, ADD/SUB family, STR/STP family, CMP/TST family, branches, MADD/MSUB family, ORR/AND/EOR shifted-reg, EXTR/ BFI/UBFX bitfield family, NOP/YIELD/barrier, w→x canonicalisation, unrecognised-mnemonic default. ctest 89/89 (87 + 2 new smokes). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…hained fixups (phase 4 C5) The phase-4 FAT-aware slice picker had a silent-wrong-result bug: when the caller-supplied triple matched a slice in the FAT, the picker returned that slice's parse only if `resolved` was non-empty. If the matched slice was a classic LC_DYLD_INFO_ONLY binary with no chained fixups (resolved.empty()), control fell through to the phase-3 preference order — which could land on a DIFFERENT slice (e.g. arm64e) with a totally different image_base. The caller's xref scan then resolved every ADRP page through the wrong slice's image_base and silently produced garbage. LLDB's choice of slice is the source of truth. If the triple matched ANY slice in the FAT, honour it — including the empty-chained-fixup case. The caller gets an empty ChainedFixupMap (no chained-fixup xref resolution) and the literal-operand / ADRP-pair scan runs against the CORRECT image_base. Only fall through to preference when NO slice in the FAT matches the triple at all (the legitimate "triple says x86_64 but FAT is arm64-only" path). The pre-existing unit test "triple-matching slice missing falls back to preference order" is correct under both pre- and post-fix behaviour because it exercises the legitimate "no triple match" fallback path. Its comments are updated to clarify the distinction. TDD evidence: new unit test "triple-matched slice WITHOUT chained fixups wins (C5 silent-wrong-result fix)" constructs a FAT with an arm64 slice (no LC_DYLD_CHAINED_FIXUPS, image_base 0x100000000) and an arm64e slice (with chained fixups, image_base 0x200000000). With triple=arm64 the test asserts: - resolved.empty() (arm64 has no fixups) - image_base != 0x200000000 (must NOT fall through to arm64e) Against pre-fix code both assertions FAIL (resolved size 2 from arm64e fall-through, image_base=0x200000000). Against post-fix code both PASS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…+N5)
Pre-fix `_resolve_autospawn_ldbd()` accepted any X_OK path in
`$LDB_LDBD_SPAWN`. A mistyped env var landing on a real but unrelated
executable (e.g. `/usr/bin/yes`, `/bin/echo`) would spawn that
binary; the spawned child never bound the socket; the client burned
~3s of connect retries before surfacing "auto-spawned ldbd never
began accepting" with zero hint that the env var was the problem.
I5: `_looks_like_ldbd()` runs `<path> --version` with a 2s timeout
and checks the output contains the literal "ldbd". Rejected paths
get a clear "LDB_LDBD_SPAWN=... does not look like ldbd" line on
stderr at resolve-time; resolution then falls through to
`shutil.which("ldbd")` and the sibling-of-ldb heuristic. The
operator sees the actual failure mode 100ms in, not 3s in.
Coupled daemon change: `ldbd --version` now prints "ldbd <version>"
instead of just "<version>". The I5 probe greps for "ldbd" in the
output; without this the probe rejects the real daemon. Matches
`ldb-dap --version`'s convention and the `ldbd --help` first-line
format. No tests pinned the old bare-semver output.
Bundled cleanups:
- N1: `_autospawn_daemon`'s docstring claimed stderr was inherited
from the parent process. Wrong since phase-2; the daemon's stderr
goes to /dev/null by default and to `$LDB_LDBD_LOG_FILE` when set.
Doc text now matches the code.
- N2: retry-loop comment said "200ms * 10 retries (~2s)" but the
loop was `range(15)`. One-line factual fix to "200ms * 15
retries (~3s)."
- N5: socket re-created inside the retry loop on each iteration.
POSIX leaves a socket whose `connect()` failed in an unspecified
state for further `connect()` calls; reusing it works on Linux
and macOS today but is pedantically undefined. Fresh socket per
iteration is one extra syscall per retry and removes the corner
case.
Test: `tests/smoke/test_socket_autospawn_validates_binary.py` pins
`$LDB_LDBD_SPAWN=/bin/echo`, strips $PATH down to python+coreutils
(no ldbd discoverable that way), runs `ldb --socket ... target.open`
from a temp CWD outside the repo. Asserts the CLI succeeds via
sibling fallback under 2.5s with the expected stderr diagnostic
mentioning `LDB_LDBD_SPAWN` and "does not look like ldbd."
TDD-verified red: pre-fix the test fails at 3.08s with the
"never began accepting" message — confirms it pins the regression.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The test docstring claimed it validates "concurrent dispatch" — the
review correctly flagged this as overstated. `dispatch_mu_` serialises
overlapping RPCs in phase-2, so two clients hitting the daemon at
the same time queue at the dispatcher. What the test actually pins:
- Accept-level concurrency. Two unix-socket connections held open
simultaneously. The pre-phase-2 single-client accept loop would
block worker B's connect() until worker A disconnected; the
barrier between target.open and module.list would deadlock.
- Per-connection target_id state persistence. Each worker opens
its own target, both succeed, both find their target_id still
alive on the second RPC.
Docstring, in-test comment on the barrier, and success message all
rewritten to match. CMake test name kept as `smoke_socket_multiclient`
— accurate at the file level, churning history for naming-only
churn isn't worth it. True per-connection dispatch parallelism is
a phase-3 item (per-target dispatcher sharding); listed in
`docs/35-field-report-followups.md`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`docs/35-field-report-followups.md`: - §2 phase-2 item 1 (Multi-subscriber notification sinks) rewritten to say "broadcast-to-all; per-target filtering happens at the client." Pre-fix the doc claimed "without cross-talk," which implied server-side target_id routing that doesn't exist in phase-2. The post-review C1 shared_ptr migration also recorded here so anyone reading the design doc sees the UAF fix in context. - "Phase 3 — carried forward" gains a new bullet for target_id- aware notification routing (the server-side filtering that phase-2 ducked). The existing "per-target dispatcher sharding" bullet reworded to call out the dispatch-parallelism dimension specifically: today two clients on independent target_ids still queue at `dispatch_mu_`. SBAPI cancellation and worker-list reaping items were already in the list and unchanged. `docs/WORKLOG.md`: new top entry summarising the phase-2 cleanup — the four pre-existing commits (`2e6f4ed` C1, `bad8f90` I2, `2978590` I3, `8c03765` I4+N3+N4) plus the new ones (`9397c03` I5+N1+N2+N5 with `ldbd --version` companion change and the new TDD-verified smoke test, `716689b` N6 test naming honesty, this commit). Decisions, surprises, and the verification stanza record the rationale for future-me / future agents. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
I1: find_string_xrefs's prior signature took no provenance — every ADRP-pair resolver diagnostic produced by the underlying xref_address scans (adrp_pair_skipped, adrp_pair_writeback_cleared, adrp_pair_cond_branch_recorded, adrp_pair_function_start_reset, adrp_pair_unresolvable_load, warnings) was silently dropped when an agent reached the resolver via string.xref instead of xref.addr. The agent then couldn't see "the heuristic skipped N loads on this binary" and had no signal to fall back to symbol-index correlate. Thread an optional XrefProvenance* through find_string_xrefs. Counters and warnings accumulate across every per-StringMatch xref_address invocation; the dispatcher attaches the aggregate to the string.xref response on the same emission policy as xref.addr (only when something fired). Phase-3 gate-7 warning emission moved to a baseline-delta scheme so sharing one provenance across N xref_address calls doesn't produce "skipped 0" duplicates — only the actual increment from each call generates a warning string. I2 (string.xref half): the dispatcher schema for string.xref now documents the same five counters + warnings array as xref.addr, each described as "aggregate across every underlying xref scan." xref.addr's schema was updated in commit ced9f17 (C1+C2) with the renamed adrp_pair_cond_branch_recorded counter and the three phase-4-added counters (cond_branch_recorded, function_start_reset, unresolvable_load). Backend interface: virtual signature change ripples through the GDB/MI stub (returns empty, no behaviour change) and every test mock backend's override (8 test files updated). New unit test pins the threaded signature works against the real fixture binary; identical-result invariant holds whether provenance is nullptr or supplied. ctest 89/89. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…anup tail) Bundle of the remaining items from the opus phase-4 review that the two cleanup agents got partway through before hitting rate limits: - I3: parse_last_hex_in_operands → lifted to xref_arm64_parsers as parse_branch_target. Picks the last comma-separated operand and parses hex from there, instead of "rightmost hex token in the whole operand string." Closes the tbz w0,#0x10,_far_label case where 0x10 (bit position) was being picked as a branch target. - I4: function_starts insert lifted above the !adrp_regs.empty() guard so the hint is recorded even when no ADRP is currently tracked. - I5: tests/smoke/test_xref_pcrel_literal.py comment now matches the fixture's actual assembly (a magic .quad rather than a pcrel_data reference); the test continues to validate the provenance counter bump. - N1: xref_condbranch.s rewritten to actually reproduce the cross-function-cbz + fall-through-ADRP-ADD pattern that the ced9f17 fix closes. The new fixture FAILS against pre-cleanup master and passes here. - N2: xref_stripped_fnleak.s comments updated to acknowledge that it exercises gate 1 (function_name_at) rather than gate 3 (function_starts) on Apple silicon, where LLDB synthesises ___lldb_unnamed_symbol_<addr>. Phase-5 follow-up captured. All 18 xref + chained-fixup tests pass. Build warning-clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
§2 phase 2: multi-client socket + per-connection notification routing, auto-spawn, in-flight RPC interruption via self-pipe, idle timeout, daemon.shutdown RPC, recursive_mutex dispatch serialisation. Plus post-review cleanup: shared_ptr-owned NotificationSinks (UAF fix), worker shutdown gate, SO_SNDTIMEO, atomic stderr lines, LDB_LDBD_SPAWN binary validation.
§6 phase 4: chained-fixup + ADRP coverage extension — conditional-branch boundary, fat_arch_64 triple-aware slice selection, stripped-binary function_starts backstop, PC-relative literal-load provenance, MOV from XZR/WZR explicit, BindInfo schema (deferred imports walk), real ARM64 C fixture. Plus post-review cleanup: cond-branch fall-through correctness, same-fn cbz no-poison, clobber-by-default destination register tracking (closes CSEL/LDP false-positive class), FAT picker triple match honored, string.xref provenance plumbing, parser hardening + adversarial fixture rewrites. # Conflicts: # docs/WORKLOG.md
CI on Ubuntu / Linux x86-64 + Linux arm64 had been failing since PR #20 merged. Two issues: 1. getpeereid() is BSD-only (also on macOS). glibc and musl don't ship it. Wrap the peer-cred retrieval in a #if __linux__ / else branch: on Linux, getsockopt(SO_PEERCRED) returns a struct ucred; on the BSDs, keep the existing getpeereid call. peer_gid is preserved on both branches for API parity with a single (void) cast to silence -Wunused-variable. 2. The two ::ftruncate(fd, 0) and ::pwrite(...) calls in acquire_lock are documented as best-effort (a failed pid stamp degrades the collision diagnostic but doesn't break exclusion). gcc's -Wunused-result, treated as an error in the warning-clean build, isn't silenced by a plain (void) cast — the standard workaround is `if (call() != 0) {}`. Use that. 98/98 ctest green on Darwin-arm64 post-fix; Linux build path now compiles cleanly via the new ifdef branch (verified by tracing through the SO_PEERCRED path, which is standard on every Linux since 2.6.17). Linux CI on merge will confirm. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bundled merge of two reviewed phase-4 branches plus a Linux portability fix that closes the CI break from PR #20.
What landed
§2 phase 2 — socket daemon multi-client + supporting work
Builds on §2 phase 1's single-client persistent socket. Adds:
NotificationSinkrouting viashared_ptrsubscriber set (closes a TSan-confirmed use-after-free inNonStopRuntime::emit_stopped_that the opus reviewer demonstrated under listener-vs-disconnect race scheduling).Dispatcherinstance, serialised via a newDispatcher::dispatch_mu_recursive_mutex(recursive becausesession.replayre-enters dispatch on the same thread).ldb --socket PATHonECONNREFUSED/ENOENTforks+execsldbd --listen unix:PATHand retries.$LDB_LDBD_SPAWNaccepts an explicit binary path; validated via--versionprobe to fail fast on bad paths.daemon.shutdownRPC +--listen-idle-timeout N+ signal-driven accept-loop wake-up via self-pipe pattern. Workers gate ong_shutdownso a misbehaving peer can't keep the daemon alive afterdaemon.shutdown.SO_SNDTIMEOon accepted fds (60s) so a slow-reader peer doesn't head-of-line block the listener thread.O_NOFOLLOWlockfile, smoke-test docstring honesty.§6 phase 4 — ARM64e xref coverage extension
Builds on §6 phases 1-3's chained-fixup parser + ADRP-pair resolver. Adds:
b.cond/cbz/cbnz/tbz/tbnz. Cross-function targets recorded infunction_starts; fall-through path preserves register state per spec.function_startsbackstop — records B/BR targets so gate 1 catches function boundaries when LLDB returns emptyfunction_name_aton both sides.BindInfoschema inChainedFixupMap— phase 4 ships the type; phase 5 will populate via imports-table walk.provenance.warningsfield plumbed throughxref.addressANDstring.xrefso an agent can see when the heuristic conservatively skipped a load.CI portability fix (final commit)
Two issues found by master's post-PR-#20 CI run:
getpeereid()is BSD/macOS-only; glibc and musl don't ship it. Wrapped in a#if defined(__linux__)/#elsebranch — Linux usesgetsockopt(SO_PEERCRED)returningstruct ucred, BSD keepsgetpeereid.-Wunused-result(treated as error in the warning-clean build) wasn't silenced by(void)casts on::ftruncate/::pwrite. Replaced withif (call() != 0) {}idioms.Constituent commits
release/phase-4itself is 4 commits ahead ofmaster:e6c8e3cMergefix/socket-daemon-phase2(7 commits underneath)2c1ad49Mergefix/chained-fixups-phase4(12 commits underneath)81d2b97ci(daemon): Linux portability fixesTest plan
ctest --test-dir build --output-on-failureon the merged release tip → 98/98 PASS on Darwin-arm64 (189s)-Wall -Wextra -Wpedantic -Wconversion -Wsign-conversion -Wshadow -Wnon-virtual-dtor -Wold-style-cast -Wcast-align -Wunused -Woverloaded-virtual -Wnull-dereference -Wdouble-promotion -Wformat=2 -Wmisleading-indentationtests/baselines/agent_workflow_tokens.json's Linux-x86_64 entry. If so, a one-line follow-up withLDB_UPDATE_BASELINE=1.Deferred to phase 5 (documented in
docs/35-field-report-followups.md)§2 phase 3:
target_id-aware notification routing (today's behaviour is broadcast-to-all subscribers).dispatch_mu_is the bottleneck for non-target-scoped work).§6 phase 5:
ChainedFixupMap::binds(schema landed).function_startsbackward boundary detection.dyld_info --fixupsoutput.🤖 Generated with Claude Code