Phase 4: §2 multi-client socket daemon + §6 ARM64e xref coverage extension + CI fix by zachgenius · Pull Request #21 · zachgenius/LDB

zachgenius · 2026-05-16T11:56:43Z

Bundled merge of two reviewed phase-4 branches plus a Linux portability fix that closes the CI break from PR #20.

What landed

§2 phase 2 — socket daemon multi-client + supporting work

Builds on §2 phase 1's single-client persistent socket. Adds:

Per-connection NotificationSink routing via shared_ptr subscriber set (closes a TSan-confirmed use-after-free in NonStopRuntime::emit_stopped_ that the opus reviewer demonstrated under listener-vs-disconnect race scheduling).
Multi-client concurrent connections — one thread per accepted connection, shared Dispatcher instance, serialised via a new Dispatcher::dispatch_mu_ recursive_mutex (recursive because session.replay re-enters dispatch on the same thread).
Auto-spawn from client — ldb --socket PATH on ECONNREFUSED/ENOENT forks+execs ldbd --listen unix:PATH and retries. $LDB_LDBD_SPAWN accepts an explicit binary path; validated via --version probe to fail fast on bad paths.
daemon.shutdown RPC + --listen-idle-timeout N + signal-driven accept-loop wake-up via self-pipe pattern. Workers gate on g_shutdown so a misbehaving peer can't keep the daemon alive after daemon.shutdown.
SO_SNDTIMEO on accepted fds (60s) so a slow-reader peer doesn't head-of-line block the listener thread.
Cosmetic: atomic stderr lines for auto-spawn collisions, O_NOFOLLOW lockfile, smoke-test docstring honesty.

§6 phase 4 — ARM64e xref coverage extension

Builds on §6 phases 1-3's chained-fixup parser + ADRP-pair resolver. Adds:

Conditional-branch boundary — b.cond / cbz / cbnz / tbz / tbnz. Cross-function targets recorded in function_starts; fall-through path preserves register state per spec.
Architectural shift: clobber-by-default destination registers — closes CSEL/CSET/CSINC/CSINV/CSNEG (common compiler "pick between two strings" idiom) AND LDP/LDPSW/LDXR/LDAR/LDXP/LDAXR (prologue/epilogue patterns). Replaces the previous whitelist of clobber-source mnemonics with parse-destination-and-clear-by-default; propagation is the explicit allowlist now.
FAT Mach-O triple-aware slice selection — matches against SBTarget's triple before falling back to phase-3's arm64e > arm64 preference. Triple match wins even when the slice has no chained fixups (closes the silent wrong-slice fallback the opus reviewer demonstrated).
Stripped-binary function_starts backstop — records B/BR targets so gate 1 catches function boundaries when LLDB returns empty function_name_at on both sides.
Pre/post-indexed LDR writeback — clears the base register after the load to prevent false matches through the now-mutated address.
STR / STUR / STRH / STRB / STP / LDUR as xref consumers — closes the false-negative class for "what writes to this global?".
PC-relative literal-load provenance — diagnostic counter for the loads phase-4 still can't resolve.
MOV from XZR / WZR explicit — replaces the prefix-character heuristic with explicit token matching.
BindInfo schema in ChainedFixupMap — phase 4 ships the type; phase 5 will populate via imports-table walk.
provenance.warnings field plumbed through xref.address AND string.xref so an agent can see when the heuristic conservatively skipped a load.

CI portability fix (final commit)

Two issues found by master's post-PR-#20 CI run:

getpeereid() is BSD/macOS-only; glibc and musl don't ship it. Wrapped in a #if defined(__linux__) / #else branch — Linux uses getsockopt(SO_PEERCRED) returning struct ucred, BSD keeps getpeereid.
gcc's -Wunused-result (treated as error in the warning-clean build) wasn't silenced by (void) casts on ::ftruncate/::pwrite. Replaced with if (call() != 0) {} idioms.

Constituent commits

release/phase-4 itself is 4 commits ahead of master:
- e6c8e3c Merge fix/socket-daemon-phase2 (7 commits underneath)
- 2c1ad49 Merge fix/chained-fixups-phase4 (12 commits underneath)
- 81d2b97 ci(daemon): Linux portability fixes
Each constituent branch went through implementation agent → opus reviewer (xhigh effort) → cleanup agent applying every reviewer-flagged blocker + nit. Both reviewers built adversarial test binaries; the §6 review caught CSEL + LDP destination clobber as a phase-4-introduced regression class that the cleanup branch fixed via the architectural shift to clobber-by-default.

Test plan

ctest --test-dir build --output-on-failure on the merged release tip → 98/98 PASS on Darwin-arm64 (189s)
Build warning-clean under macOS Apple Clang + -Wall -Wextra -Wpedantic -Wconversion -Wsign-conversion -Wshadow -Wnon-virtual-dtor -Wold-style-cast -Wcast-align -Wunused -Woverloaded-virtual -Wnull-dereference -Wdouble-promotion -Wformat=2 -Wmisleading-indentation
CI Linux paths traced through manually for the SO_PEERCRED branch; standard kernel API since 2.6.17
Linux CI (verify on merge) — predicted: green. Token-budget Linux baseline drift may need regen if the per-platform total moves > 10% from tests/baselines/agent_workflow_tokens.json's Linux-x86_64 entry. If so, a one-line follow-up with LDB_UPDATE_BASELINE=1.

Deferred to phase 5 (documented in `docs/35-field-report-followups.md`)

§2 phase 3:

Server-side target_id-aware notification routing (today's behaviour is broadcast-to-all subscribers).
True per-connection dispatch parallelism (current dispatch_mu_ is the bottleneck for non-target-scoped work).
Workers list reaping.
SBAPI cancellation (LLDB ABI doesn't currently permit it).

§6 phase 5:

Full imports-table walk populating ChainedFixupMap::binds (schema landed).
function_starts backward boundary detection.
Complete clobber-by-default audit across every ARM64 instruction (CSEL/LDP family covered; MADD/MSUB/UMULL/SMULL/EOR/ORR/AND/ASR-imm/LSL-imm/EXTR/BFI/BFM/UBFX/SBFX/FMOV remain whitelist-only).
Indirect-dispatch entry points (vtables, jump tables, ObjC dispatch).
Real iOS .ipa CI smoke against dyld_info --fixups output.

🤖 Generated with Claude Code

… item 5) Move MovSrcKind + classify_mov_source from lldb_backend.cpp's anonymous namespace to xref_arm64_parsers so unit tests can pin the alias-name- first match order without a live LLDB target. The prior implementation worked by accident — `lr` / `xzr` / `wzr` happened to land in the right switch arm via fall-through, but a future refactor that touched the prefix-check could silently regress. Phase 4 item 5 from docs/35-field-report-followups.md §3: token-compare against the alias spellings BEFORE any prefix heuristic. New unit tests pin classify_mov_source's behaviour for the zero (xzr/wzr/#0), stack pointer (sp/wsp), link register (lr), xN/wN width-distinguishing, and malformed-input arms. No behaviour change against existing fixtures — the lifted function is byte-identical to the previous in-place implementation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phase-1 socket mode points a single NotificationSink at the dispatcher on accept() and clears it on disconnect. That is race-free only because phase 1 is strictly one connection at a time — no other sink is alive to receive a notification belonging to a different connection. Phase 2 needs to accept multiple concurrent connections, which breaks the single-sink design: connection A's stop event would either route to connection B's OutputChannel (after B's accept re-pointed the sink), or vanish (after A's disconnect cleared it but before B's accept). Either outcome corrupts the JSON-RPC stream that every client sees. NonStopRuntime now owns a subscriber SET, guarded by `sinks_mu_`. Each connection that wants notifications calls `add_notification_sink` on accept and `remove_notification_sink` on disconnect. emit_stopped_ snapshots the subscriber list under a shared lock, drops the lock, then fans the notification out — so a slow sink (one whose OutputChannel's mutex is contended) doesn't stall the other subscribers' deliveries. `set_notification_sink(sink)` is kept as a back-compat shim with new "replace the entire subscriber set with this one" semantics. Stdio mode (main.cpp) still calls it once at startup and gets the same behaviour as before. Phase-2 socket_loop.cpp migrates to add/remove so multiple connections coexist without disturbing one another. The runtime's single emit funnel point (set_stopped → emit_stopped_) is the only call site for thread.event notifications in the daemon today; the NonStopListener forwards parsed RSP stop replies through runtime.set_stopped, and probe / breakpoint events use no separate emission path. The subscriber set therefore covers every async notification the dispatcher fires. Tests: - New unit cases in `tests/unit/test_nonstop_runtime.cpp` pin the fan-out, the remove behaviour, and the set/clear back-compat semantics. All four failed-as-expected before the implementation and pass after. - The existing `set_notification_sink` callers in test_nonstop_listener and test_dispatcher_nonstop still work — the new "replace all" semantics match what those tests assume (one sink, no others). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phase 1 served one connection at a time: accept() → serve_one_connection in the calling thread → close → next accept. An agent script wanting to fire two `ldb` invocations in parallel against the same daemon had to serialise them externally, or each call paid the spawn cost. This commit accepts a connection, spawns a std::thread per connection that owns its fd for its entire lifetime, and the main thread goes straight back to accept(). The Dispatcher is shared; concurrent RPC service is serialised through its new `dispatch_mu_` outer lock. Concurrency audit (recorded for the next reviewer): - `LldbBackend::Impl::mu` already guards every public method's SBAPI access. Every public LldbBackend method acquires it; nothing changed in this commit. The phase-3 chained-fixups branch's drop-mu-during-file-IO pattern still holds. - `ProbeOrchestrator` has its own `mu_`. Every public method takes it; callback paths re-acquire when re-entering the orchestrator. - `SessionStore` and `ArtifactStore` each have their own internal mutex around sqlite access (single-writer assumption preserved by WAL). - `NonStopRuntime` has its own per-instance shared_mutex (state map) and the subscriber set lock added in the prereq commit. - `Dispatcher`'s OWN mutable state — target_main_module_, diff_cache_ + diff_cache_index_, cost_samples_, python_unwinders_, rsp_channels_, active_session_writer_, active_session_id_ — was NOT thread-safe. `dispatch_mu_` covers all of it under one outer lock for the duration of every dispatch() call. Strategy: serialise via dispatch_mu_ around the entire dispatch lifetime. Correct, dumb, and low-throughput in the multi-client case (one RPC at a time across all connections). Per-target sharding is the natural phase-3 refinement; the dispatcher's mutable state would have to migrate to a per-target map first. Documented in `dispatcher.h`. Shutdown sequence: signal handler sets g_shutdown; accept() returns EINTR; the main loop notices the flag and exits the accept loop. On the way out we join every outstanding worker thread. In-flight RPCs run to completion (LldbBackend's SBAPI calls aren't interruptible from outside); a separate item in §2 phase-2 plans a self-pipe + poll() refinement for finer-grained cancellation. Tests: - New `tests/smoke/test_socket_multiclient.py`: two Python threads each open a socket, run `target.open` (with its module list as a side effect — see handle_target_open), sync on a barrier, then run `module.list`. The barrier times out at 10s; phase-1 serial service would deadlock there because the second connection's accept() blocks until the first disconnects. - Failed against the pre-fix daemon (barrier timeout, observed in the RED ctest run). Passes after the thread-per-connection refactor. - All existing socket tests (lifecycle, collision, perms) still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…se 4 item 1) Phase 3 resets adrp_regs on RET / unconditional B / BR only. Conditional branches (b.cond / cbz / cbnz / tbz / tbnz) whose target sits in a different function are tail-call-like handoffs; on the symbolized side, gate 1's function_name_at check catches the leak when the scanner steps into the target function, but on the stripped side gate 1 silently misses it (both adjacent functions return "" from function_name_at). Implement option (b) from docs/35-field-report-followups.md §3 phase 4: parse the conditional's target operand inline (LLDB renders it as `0xNNNNNNN`), resolve to a function name, and reset adrp_regs when that name differs from the current function. Skip the parse when adrp_regs is empty (the function_name_at call dominates cost; mirrors gate 1's same optimisation). Bump a new provenance.adrp_pair_cond_branch_reset counter so callers can see when the heuristic conservatively dropped tracking — in stripped binaries this is the only signal. Provenance schema additions (forward-compatible): - adrp_pair_cond_branch_reset (item 1) - adrp_pair_function_start_reset (item 3 — wired in a subsequent commit) - adrp_pair_unresolvable_load (item 4 — wired in a subsequent commit) The two not-yet-populated counters are exposed on the wire now so the dispatcher's serialisation path doesn't need a second pass when later commits populate them. TDD: tests/fixtures/asm/xref_condbranch.s + test_xref_condbranch.py. The fixture is symbolized so gate 1 also covers the leak, but the test pins provenance.adrp_pair_cond_branch_reset > 0 to prove the new path fired — a future refactor that silently deletes the path would flip the assertion red. ctest: 10/10 xref smoke tests pass. No regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…e 4 item 2) Phase 3's FAT picker preferred arm64e > arm64 unconditionally. When LLDB loaded the arm64 slice of a FAT binary that ALSO had an arm64e slice, the picker still returned the arm64e map — different image_base, zero matches in xref_address. Phase 4 item 2 closes the loop: extract_chained_fixups_from_macho() gains an optional std::string_view triple parameter. The dispatcher calls SBTarget::GetTriple() and passes it through; the FAT picker classifies the triple ("arm64e-" / "arm64-" / "x86_64-") into the preferred (cpu_type, cpu_subtype) pair and tries the matching slice first. Falls back to the phase-3 preference order when: - triple is empty (existing callers haven't been migrated yet) - triple names an unknown arch - the matching slice exists but has no chained fixups This keeps the existing behaviour for any caller that doesn't yet plumb the triple through; new callers see exact-match selection. ARM64_ALL (subtype 0) match also accepts ARM64_V8 (subtype 1) — the LLDB triple "arm64-" can map to either subtype depending on the slice the linker tagged. Skip when the triple demanded arm64e (V8 is not arm64e). TDD: 4 new unit tests under [chained_fixups][macho][fat][triple] in tests/unit/test_chained_fixups.cpp pin: arm64 triple picks arm64 slice (image_base proves it), arm64e triple picks arm64e slice, empty triple falls back to phase-3 default, missing-matching-slice falls back too. 15/15 [chained_fixups] tests pass; 10/10 xref smoke tests still green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phase-1 expected the operator to run `ldbd --listen unix:PATH` once manually before issuing any `ldb --socket PATH` invocations; a stale or missing daemon surfaced as a bare "could not connect" error. For shell scripts that want the persistent-state property without the ceremony of managing the daemon lifecycle by hand, the obvious ergonomic ask is "just start one if it isn't running." `_SocketProc` now detects the ECONNREFUSED / ENOENT / ENXIO subset of connect() failures, fork+execs `ldbd --listen unix:PATH` with `start_new_session=True` (setsid), waits up to ~3s for the socket to start accepting, and retries the connect. The auto-spawned daemon outlives the client process so the next CLI invocation reuses it without re-spawning. The ldbd binary is resolved through a three-step search: 1. $LDB_LDBD_SPAWN — explicit override; tests use this to pin the build's ldbd binary without depending on $PATH discovery. 2. shutil.which("ldbd") — global install. 3. _find_ldbd_sibling() — the in-tree heuristic that the §1 sibling-lookup commit established for `--ldbd`. stdin/stdout/stderr are ALL redirected to /dev/null in the daemon. The earlier sketch (which inherited the client's stderr to preserve diagnostics) caused a subtle test-runner hang: when a caller wrapped `ldb --socket ...` with subprocess.run capture_output=True, the daemon inherited the captured stderr pipe and held it open across the client's exit — the wrapper never saw EOF and blocked indefinitely. Operators who want the diagnostics now set $LDB_LDBD_LOG_FILE; the spawn redirects stderr to that path instead. Help text updated to document the auto-spawn flow. Tests: - New `tests/smoke/test_socket_autospawn.py`: * Picks a fresh tempdir socket path; no daemon running. * Invokes `ldb --socket $path target.open ...`. Asserts rc=0 and a valid target_id. * Invokes a second `ldb --socket $path module.list target_id=$N`. Asserts rc=0 — proves the daemon persisted. * Kills the daemon by pid recovered from $sock.lock; asserts socket inode unlinked. - Failed RED before the implementation (the daemon never spawned; the test's `expect(rc == 0)` tripped immediately). Passes after. - The four existing socket tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… 4 item 3) Phase 3's gate 1 uses function_name_at() to detect function boundaries. On a stripped Mach-O without LC_SYMTAB local symbols, function_name_at would return "" for adjacent functions and gate 1 silently treats them as one — adrp_regs leaks across. (On macOS / Apple-silicon, LLDB synthesises ___lldb_unnamed_symbol_<addr> per-address names so gate 1 still works; the leak fires on platforms where LLDB doesn't synthesise OR when the bytes between two functions look like raw code with no function-context lookup hit. Real WeChat-class iOS binaries have hit this pattern in the field.) Phase 4 item 3 records every B / BL / conditional-branch target inside the current code section as a function-start hint. The check fires BEFORE gate 1: when the scanner reaches an instruction whose address is in the function_starts set, adrp_regs is reset and the new provenance.adrp_pair_function_start_reset counter bumps. The two paths are complementary — either is sufficient, the union is the discriminating signal. Lift the hex-token parser used by the cbz-target check (item 1) into a shared lambda parse_last_hex_in_operands so both paths use the same logic. Single-pass / forward-only: a branch at file_addr X to target Y only takes effect for Y > X (the common case in compiler-emitted code; backward-only-reached functions still miss). TDD fixture: tests/fixtures/asm/xref_stripped_fnleak.s — two adjacent non-globl functions linked through `bl`, with `strip -x` applied post-link to remove the local function symbols. x19 (callee-saved per AAPCS64) holds an ADRP page across the BL so phase 3's caller-saved clear can't mask the leak. The smoke test asserts zero false-positive matches; documents that on macOS gate 1's synthesised names also cover the boundary, so the test doesn't strictly require the function_start_reset path to fire (correctness is what matters). Bumps the worktree's smoke-test count from 82 to 83. ctest 100% green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two phase-2 items that share their plumbing: 4. SIGTERM mid-accept must wake the listener within milliseconds. Phase-1 polled g_shutdown only between connections; a daemon idle in accept() saw EINTR on signal and exited, but only by accident — bare accept() returned EINTR and the loop's flag check fired on the next iteration. Adding poll() with non-blocking accept() makes that explicit and gives us a second wakeable fd for #6 below. 6. `daemon.shutdown` RPC: a connected client can ask the daemon to exit cleanly. The handler returns `{ok:true}` and triggers the same wake mechanism that SIGTERM uses, so an orchestrator can drain the daemon without spawning a "kill by pid" step. The shared mechanism is a self-pipe. Both ends are CLOEXEC and non-blocking. The signal handler writes a byte (write(2) is async-signal-safe per POSIX); the daemon.shutdown callback writes the same byte from the worker thread. The main accept loop's poll() monitors srv + pipe[0]; on POLLIN of pipe[0] it drains the pipe (non-blocking read, so the drain terminates with EAGAIN once empty — the prior blocking-read attempt deadlocked here, only discovered by tracing the daemon.shutdown test failure) and checks g_shutdown. Bug found while writing this: the read-end of the self-pipe must also be O_NONBLOCK, not just the write end. The drain loop reads in a loop until read() returns ≤ 0; with a blocking read end, the SECOND iteration (pipe empty after consuming the wake byte) blocks forever. The non-blocking flag makes it return EAGAIN instead. Scope clarification (per docs §2 "in-flight RPC interruption"): this commit only stops accepting new RPCs immediately and lets the currently-executing dispatch run to completion. Cancelling an in-flight LldbBackend SBAPI call from outside is genuinely impossible against the LLDB ABI; the test `test_socket_interruption.py` documents that scope by closing the client socket so the worker sees EOF cleanly. The shutdown callback is wired only in listen mode; stdio mode's `daemon.shutdown` returns -32002 with a "use stdin EOF or SIGTERM" message. describe.endpoints catalog grew one entry for `daemon.shutdown`. Schema is trivial (no params; returns `{ok: bool}`). Tests: - New `tests/smoke/test_daemon_shutdown_rpc.py`: connects, sends daemon.shutdown, verifies ok=true reply, closes client, asserts daemon exits within 10s with rc=0 and the socket/lockfile gone. - New `tests/smoke/test_socket_interruption.py`: connects, completes one describe.endpoints call, sends SIGTERM to the daemon, closes the client, asserts daemon exits within 5s with rc=0. Pre-fix daemon hung in the accept loop until the signal arrived AND a new connection event happened (or the bare accept's EINTR fired) — the poll-based path makes it deterministic. - All five prior socket tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phase 3's gate 7 bumps adrp_pair_skipped for register-offset LDRs with a tracked base (`[xN, xM]` / `[xN, xM, lsl #imm]`). Phase 4 item 4 extends the family: PC-relative literal loads (`ldr xN, #imm` / `ldr xN, 0xNNNN`) bypass the ADRP+pair pattern entirely — they load the slot's value via PC-relative addressing, not through a register the scanner tracked. The literal-pool slot might hold a pointer to a string or constant in __TEXT/__cstring or __DATA_CONST. The scanner can't statically dereference it (would need to re-read the segment data at file_addr + pcrel_imm). Phase 4 bumps the new adrp_pair_unresolvable_load counter so callers see this happened, instead of the load silently disappearing. Detection shape: in the "memop didn't match resolve_adrp_consumer" fallback, after the existing `[xN, ...]` register-offset branch, check for an immediate-shaped operand (`#imm` / `0xNNN` / `-imm`). Only `ldr` / `ldrsw` produce literal-pool loads on arm64 — stores and short loads use different addressing modes. The new counter (and the matching adrp_pair_function_start_reset for item 3) is exposed on the wire by the dispatcher path that already serialises the other adrp_pair_* fields. TDD: tests/fixtures/asm/xref_pcrel_literal.s — `ldr x0, _pcrel_const` where _pcrel_const is a quad inside __TEXT/__text. The smoke test asserts provenance.adrp_pair_unresolvable_load >= 1. xref.addr against `_pcrel_data` returns 0 matches today (the heuristic gives up on the literal); the counter is the contract that surfaces this to the caller. 12/12 xref smoke tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… item 6) The phase-4 spec for bind resolution (docs/35-field-report-followups.md §3 item 6) allows shipping only the schema if the imports-table walk becomes too complex for one branch. The parse / walk itself spans: - dyld_chained_fixups_header::imports_offset / imports_count / imports_format (three formats: DYLD_CHAINED_IMPORT, _IMPORT_ADDEND, _IMPORT_ADDEND64) - Indexing into the imports table by the bind's ordinal field (24-bit or wider depending on format) - String-table lookup via name_offset into the symbols region - Optional SBTarget::FindSymbols(name) for resolved_addr when a process is loaded That's ~150 LOC of byte-level parsing across three import formats. To keep this branch tight, ship only the schema additions: - new BindInfo struct: name, addend, ordinal, resolved_addr (opt). - new ChainedFixupMap::binds map: rva → BindInfo, populated by the phase-5 walk; today's parser leaves it empty for every fixture. Three new unit tests pin the schema: - BindInfo default-constructible with empty fields - ChainedFixupMap.binds empty by default - parse_chained_fixups leaves binds empty on a rebase-only payload The phase-5 commit that wires the walk in will populate binds for test vectors that carry imports_count > 0 and flip the third assertion. Today's 18/18 [chained_fixups] tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

An orchestrator that auto-spawns the daemon (the §2 phase-2 client- side auto-spawn lands an `ldbd --listen unix:PATH` if no daemon is running) probably wants that daemon to die quietly after the burst of activity finishes. Otherwise every interactive session leaves a lingering ldbd, and the operator has to clean it up by hand. `--listen-idle-timeout N` gates the daemon's shutdown on the accept-loop's poll() returning 0 (timeout elapsed) AND a "no live workers" check. Both conditions are necessary: a long-lived agent session might idle on a connected socket for >N seconds while the user thinks; pulling the daemon down would surface as a mysterious disconnect. Implementation: - New static atomic `g_live_workers` tracks the count of running per-connection worker threads. The accept loop increments BEFORE std::thread construction (so a poll wake-up that races with this spawn can't observe zero workers); the worker decrements on exit. - `poll()` takes `idle_timeout_sec * 1000ms` as its timeout argument when `idle_timeout > 0 && live_workers == 0`, otherwise -1 (block forever). On `poll()` returning 0 the loop rechecks live_workers (catching the case where a worker emerged during the gap) and, if still zero, sets g_shutdown and breaks. The existing teardown path (close listener, unlink socket + lockfile, join workers) runs unchanged. - Workers write a wake byte to the self-pipe when they exit so the accept loop re-evaluates the timeout. Linux's poll resets the timeout per-call but macOS's preserves it across spurious returns; the explicit wake makes the behaviour uniform without depending on the platform's poll semantics. Tests: - New `tests/smoke/test_socket_idle_timeout.py`: starts `ldbd --listen-idle-timeout 2 --listen ...`, waits 8s with no clients, asserts the daemon exited rc=0 and the socket/lockfile are gone. Pre-fix daemon (no idle timeout) hangs in poll() forever; the test would time out at 30s. - All six prior socket tests still pass. `ldbd --help` text grew a paragraph documenting the flag. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ase 4 item 7) Phase 4 item 7 (docs/35-field-report-followups.md §3) asks for a moderate-size C-compiled fixture that exercises the resolver in shapes closer to real iOS app binaries than the hand-assembled phase-3 fixtures. Add tests/fixtures/c/real_world_xref.c: 1. static const char *const k_string_table[3]: selref-style ADRP+LDR through a __DATA_CONST chained-fixup slot. 2. Multiple functions in one TU exercising function-boundary reset (RET-clear + name-based + function_starts). 3. Conditional-branch tail-call (`if (which == 0) return real_xref_pick(0); return k_string_table[which];`) — proves phase 4 item 1's cross-function reset doesn't eat the legitimate same-function fall-through xref. 4. extern malloc / free imports — exercises the chained-fixup binds path (BindInfo schema; resolution is phase 5). Build: -arch arm64 -O1 -Wl,-fixup_chains so the linker emits LC_DYLD_CHAINED_FIXUPS with __DATA_CONST rebases for the string table. Apple-silicon-arm64 only. Smoke test asserts: - Every entry in k_string_table[] surfaces at least one xref instruction via string.xref (slot-indirection path live). - A non-pointer literal (0x1122334455667788) surfaces zero matches (false-positive density on a 4-function TU is the noise-floor metric). Spot-check against /usr/bin/uname (host-dependent, not automated): triple = arm64e-apple-macosx26.3.0; FAT slice picker (item 2) selected arm64e correctly. 8 sampled strings each returned 1 xref with empty provenance — no skips, no warnings, no false positives. Documented as a manual probe; not a CI assertion because the binary changes across macOS versions. ctest: 84/84 (was 83) all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Move docs/35-field-report-followups.md §3 "Phase 4 — carried forward" subsection into "Phase 4 — what shipped" with commit SHAs and acceptance evidence for each item. New "Phase 5 — carried forward" subsection captures the items still deferred (full bind walk, auth- rebase key-class filtering, on-disk cache, correlate.* wire-up, multi-module xref, full dataflow, CI assertions on real iOS binaries). Worklog entry pins the seven phase-4 commits, the decisions behind option (b) for conditional-branch handling, the schema-only ship for bind resolution, and the manual /usr/bin/uname spot-check that replaced the spec's /usr/bin/grep suggestion (grep's __cstring is empty — strings come from the shared cache, not the binary). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

session.replay's per-row loop calls dispatch() re-entrantly so the replayed request goes through the full outer wrapper — provenance decoration + per-RPC cost recording still fire, while the session- log append no-ops because replay suspends the writer. The multi- client commit's std::mutex deadlocked there; the ctest smoke_session_replay run pinned this within seconds. std::recursive_mutex restores correctness without losing the cross-thread serialisation property. Same-thread re-entry is now free; cross-thread overlap still queues at the lock. The overhead per-acquisition vs std::mutex is negligible compared to the work inside any real RPC. Also folds in: - `docs/35-field-report-followups.md §2`: "Phase 2 — what shipped" subsection records the six items that landed (multi-subscriber sinks, multi-client listener, auto-spawn, signal-driven wakeup, daemon.shutdown, idle timeout) with the concurrency audit notes. "Phase 3 — carried forward" enumerates the deferred items (token auth, per-target dispatcher sharding, true in-flight cancellation, worker reaping mid-flight, TLS, single-client RPC multiplexing). - `docs/WORKLOG.md`: new dated entry summarising the goals, per-commit deliverables, key decisions, surprises (the capture_output / stderr-inheritance hang; both-ends-non-blocking for the self-pipe; phase4-xref-improvements worktree contamination), and verification. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…AF fix) The phase-2 NonStopRuntime stored raw NotificationSink* in its subscriber vector. emit_stopped_ snapshotted the raw pointers under a shared lock, dropped the lock, then dereferenced — a concurrent remove_notification_sink (on a connection-worker thread) racing with sink destruction (the worker's stack-local StreamNotificationSink going out of scope on disconnect) could free the sink while the listener thread still held the raw pointer in its snapshot. Reviewer reproduced it with TSan (vptr race) and ASan (heap-use-after-free) on a focused multi-threaded unit test. Fix: migrate subscriber storage from `NotificationSink*` to `std::shared_ptr<NotificationSink>`. emit_stopped_'s snapshot now copies shared_ptrs, bumping refcounts; every sink in the snapshot stays alive across the iteration regardless of concurrent remove. On the connection-worker side, the per-connection StreamNotificationSink is allocated via std::make_shared so the runtime's strong ref and any in-flight emit's snapshot ref both keep it alive past the worker's return. remove_notification_sink and set_notification_sink move the doomed sinks out of the vector under the lock and drop them AFTER releasing, so a sink destructor that might re-enter the runtime can't deadlock on sinks_mu_. Test: tests/unit/test_nonstop_runtime.cpp adds a 200ms-budgeted concurrent stress test (emitter thread vs add/remove churn thread) and a synchronous "runtime keeps sink alive across emit even if caller drops its ref" test using weak_ptr observation. Both pass TSan (`-fsanitize=thread`, sibling `build-tsan/` dir). Updated existing call sites: main.cpp (stdio sink → make_shared), socket_loop.cpp (per-connection sink → make_shared), test_nonstop_*.cpp (local sinks → shared_ptr). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Pre-fix: after `daemon.shutdown` or SIGTERM set `g_shutdown`, the accept loop stopped accepting NEW connections — but already-connected workers kept reading + dispatching RPCs as long as the peer kept sending. The phase-2 doc claims "shutdown stops accepting new RPCs immediately"; reality was broader, and the daemon process would linger long after the accept loop had exited because workers were still dispatching. Fix: `serve_one_connection` takes an optional `is_shutdown` predicate. Between read and dispatch, if the predicate returns true, the worker synthesises a kBadState ("daemon shutting down") response — echoing the request id for correlation — and breaks out of the loop. The worker returns, the accept-loop join unblocks, the daemon exits. Stdio mode keeps the default (empty predicate evaluates as false) so its single-client semantics are unchanged. The socket loop passes a closure over the file-scope `g_shutdown` atomic. Test: `tests/smoke/test_socket_shutdown_active_clients.py` exercises the cross-cutting promise — two clients A and B; B sends daemon.shutdown; A's next RPC must surface a shutdown error (or clean EOF), NOT a normal success response; daemon exits within a generous window. Without the fix the test fails because A's hello is dispatched successfully past the shutdown latch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

A connected-but-not-reading peer let the kernel send buffer fill; the daemon's `::write(2)` in `FdStreambuf::sync` then blocked indefinitely. The listener thread serving notifications calls the same write path through `OutputChannel`, so an indefinitely-blocked write held the inner backend's `map_mu_` shared. A second client's `target.close` then wants `map_mu_` UNIQUE while holding `dispatch_mu_` — the whole daemon wedges accepting new connections but unable to service any RPC behind the dead-peer write. Fix: mirror the existing `SO_RCVTIMEO` setsockopt block. 60 seconds is far past any benign reply round-trip but tight enough that a wedge doesn't keep the daemon unresponsive for minutes. On EAGAIN the streambuf latches `write_failed_`, `write_response` throws `protocol::Error`, and the worker exits cleanly via the existing error-handling path. Test: `tests/smoke/test_socket_slow_reader.py` — client A connects with a small SO_RCVBUF, fires a stream of `describe.endpoints` RPCs (~50KB reply each), never reads. Client B concurrently does a tiny hello and must get a response in well under 30s. After tearing down A, the daemon must exit on SIGTERM within 15s — pre-fix it could sit on the blocked write to A indefinitely. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phase-4 item 1 (commit 311c439) introduced two silent-wrong-result regressions against phase 3. C1 (fall-through clobber): the implementation unconditionally cleared adrp_regs on a cross-function cond branch, including the source-side fall-through path. The spec literally reads "Fall-through path: preserve state". An `add x0, x8, _t@PAGEOFF` after `cbz x9, _other_fn` is in the source function by definition; clearing x8 silently lost the xref. C2 (same-fn target poisons function_starts): the cond-branch block also unconditionally inserted the target into function_starts. A same-function cbz to a local label (Lhere, loop backedges, basic-block merges) then triggered gate 3 to reset adrp_regs at the label, killing the post-label consumer's xref. Fix: rework the cond-branch block. - No more source-side adrp_regs.clear(). The fall-through stays tracked. - function_starts.insert() and the provenance bump fire only when the target's function differs from the current function. Same-fn targets no longer poison function_starts. - Counter renamed: adrp_pair_cond_branch_reset → _recorded (we record a target hint now, we don't reset state). - Move the cond-branch bookkeeping outside the `!adrp_regs.empty()` guard (I4): the function_start hint is valuable for LATER iterations once an ADRP becomes tracked, even if no ADRP is tracked at the cbz site. Updated dispatcher schema (I2 partial): the existing schema only declared two counters; bring it up to date with the five the code emits, with docstrings explaining each one's semantics. TDD evidence: two new fixtures + smokes (xref_cond_fallthrough.s, xref_cond_same_fn.s) failed RED against 2b170ce with the diagnostic "the legitimate xref against … vanished" and the matches list empty. Post-fix both pass; the existing xref_condbranch smoke (the cross-fn case) continues to pass with the renamed counter. ctest 87/87 (85 prior + 2 new). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

I4: when N clients race-spawn N daemons against the same socket path, the (N-1) losers all write diagnostic lines to the SAME stderr (often via LDB_LDBD_LOG_FILE redirection). Pre-fix each line was emitted as a chain of `std::cerr << "ldbd: ..." << pid << ... << "\n"` shifts; libstdc++ flushes each shift as its own write(2) syscall, and concurrent processes interleave the bytes mid-line. Operators saw "ldbd: another daemon is already lis ldbd: another daemon is alr". Fix: introduce `log_err_line(std::string)` which emits the line with a single `std::fwrite(..., stderr)`. POSIX guarantees a single write of ≤PIPE_BUF (typically 512) bytes to a regular file or pipe is atomic w.r.t. concurrent writers. Convert every multi-shift stderr line in this file to use it. Test (`tests/smoke/test_socket_autospawn_logs.py`): launch 10 daemons against the same socket path with stderr aimed at a single log file. Exactly one wins the bind race; the rest exit with a diagnostic. Verify every non-empty line in the log starts with `ldbd: ` — i.e. no diagnostic got torn across a write boundary. N3: `g_shutdown_pipe[1]` is read in the signal handler. While the unaligned-int read is harmless on aarch64 in practice, strict conformance requires an `std::atomic<int>` for the cross-thread publish/load. Introduce `g_shutdown_pipe_write` atomic, published under release-store AFTER FD_CLOEXEC + O_NONBLOCK are set, cleared to -1 BEFORE the close in teardown. A late signal arriving during shutdown now observes the sentinel and skips the write — pre-fix it could (rarely) write to a closed fd or, worse, a recycled fd of an unrelated open. N4: workers list grows for daemon lifetime. Reviewer flagged this as legitimately phase-3-deferable; add an explicit `TODO(phase 3 / N4)` comment next to the list declaration so a future maintainer doesn't rediscover it cold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The phase-3/4 ADRP-pair resolver maintained a WHITELIST of mnemonics in the post-emit state-mutation block: ADD/SUB/ADDS/SUBS clobbered the dst, MOV variants ran apply_mov_state, calls clobbered the AAPCS64 caller-saved set, returns cleared the map. Every OTHER register-writing instruction silently left dst tracking intact. That whitelist was the wrong invariant. CSEL / CSET / CSINC / CSINV / CSNEG, LDP / LDPSW / LDXP / LDAR / LDAXR, MADD / MSUB, EXTR / BFI / BFM / UBFX / SBFX / UBFM / SBFM, ORR / AND / EOR / EON with shifted-reg, FMOV (to GPR), SDIV / UDIV, REV / CLZ, ASR / LSL / LSR / ROR shifts — every one of them writes a destination register but none of them appeared in the whitelist. After any of them ran, the destination register kept whatever ADRP page it previously held and the next LDR or ADD through that register produced a silent false positive. C3 (CSEL): the "pick between two strings" compiler idiom emits adrp x8, _str_a@PAGE adrp x9, _str_b@PAGE ... csel x8, x9, x8, gt ldr x0, [x8, #0x10] xref.addr(_str_a + 0x10) falsely matched the LDR — this is the most common false-positive vector in real iOS / macOS binaries. C4 (LDP): a function entry's `ldp x8, x9, [sp]` (callee-saved reload) rewrites x8 from memory; any prior ADRP into x8 is gone. The phase-3 resolver didn't model paired loads at all, so the post-LDP ADD false-matched. Architectural shift: clobber-by-default. Introduce a new helper parse_destination_registers(mnemonic, operands) in xref_arm64_parsers that returns the canonical x-register names an instruction writes. The post-emit pass runs explicit propagation paths first (ADRP records, MOV propagates, calls clobber caller- saved, returns/B clears all), then the new pass erases every destination register that wasn't already handled by an explicit arm. dst_already_handled gates the second pass so legitimate ADRP/MOV tracking isn't undone. The helper handles 14 mnemonic categories: - Stores (STR/STP/STUR/STRH/STRB/STLR/STNP/...) — no dst. - Compares (CMP/CMN/TST/CCMP/CCMN/FCMP/...) — no dst. - Branches & returns (B/BL/BR/BLR/CBZ/TBZ/B.cond/...) — no dst. - System (NOP/YIELD/WFE/DMB/DSB/ISB/MSR/...) — no dst. - Paired loads (LDP/LDPSW/LDXP/LDAXP/LDNP) — two dsts. - Default: first operand register is the destination. The default catches CSEL/CSET/CSINC/CSINV/CSNEG/MADD/MSUB/ORR/ AND/EOR/EXTR/BFI/UBFX/etc. without enumeration. clobber_arith_destination is removed — ADD/SUB/ADDS/SUBS now fall through to the generic pass which produces identical behaviour. TDD evidence: two new fixtures + smokes (xref_csel.s, xref_ldp_clobber.s) failed RED against ced9f17 with the diagnostic "the LDR/ADD through stale x8 matched against …". Post-fix both pass; 16 new unit test cases pin parse_destination_registers behaviour across CSEL, LDP, LDPSW, LDXP/LDAXP, LDR family, ADD/SUB family, STR/STP family, CMP/TST family, branches, MADD/MSUB family, ORR/AND/EOR shifted-reg, EXTR/ BFI/UBFX bitfield family, NOP/YIELD/barrier, w→x canonicalisation, unrecognised-mnemonic default. ctest 89/89 (87 + 2 new smokes). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…hained fixups (phase 4 C5) The phase-4 FAT-aware slice picker had a silent-wrong-result bug: when the caller-supplied triple matched a slice in the FAT, the picker returned that slice's parse only if `resolved` was non-empty. If the matched slice was a classic LC_DYLD_INFO_ONLY binary with no chained fixups (resolved.empty()), control fell through to the phase-3 preference order — which could land on a DIFFERENT slice (e.g. arm64e) with a totally different image_base. The caller's xref scan then resolved every ADRP page through the wrong slice's image_base and silently produced garbage. LLDB's choice of slice is the source of truth. If the triple matched ANY slice in the FAT, honour it — including the empty-chained-fixup case. The caller gets an empty ChainedFixupMap (no chained-fixup xref resolution) and the literal-operand / ADRP-pair scan runs against the CORRECT image_base. Only fall through to preference when NO slice in the FAT matches the triple at all (the legitimate "triple says x86_64 but FAT is arm64-only" path). The pre-existing unit test "triple-matching slice missing falls back to preference order" is correct under both pre- and post-fix behaviour because it exercises the legitimate "no triple match" fallback path. Its comments are updated to clarify the distinction. TDD evidence: new unit test "triple-matched slice WITHOUT chained fixups wins (C5 silent-wrong-result fix)" constructs a FAT with an arm64 slice (no LC_DYLD_CHAINED_FIXUPS, image_base 0x100000000) and an arm64e slice (with chained fixups, image_base 0x200000000). With triple=arm64 the test asserts: - resolved.empty() (arm64 has no fixups) - image_base != 0x200000000 (must NOT fall through to arm64e) Against pre-fix code both assertions FAIL (resolved size 2 from arm64e fall-through, image_base=0x200000000). Against post-fix code both PASS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…+N5) Pre-fix `_resolve_autospawn_ldbd()` accepted any X_OK path in `$LDB_LDBD_SPAWN`. A mistyped env var landing on a real but unrelated executable (e.g. `/usr/bin/yes`, `/bin/echo`) would spawn that binary; the spawned child never bound the socket; the client burned ~3s of connect retries before surfacing "auto-spawned ldbd never began accepting" with zero hint that the env var was the problem. I5: `_looks_like_ldbd()` runs `<path> --version` with a 2s timeout and checks the output contains the literal "ldbd". Rejected paths get a clear "LDB_LDBD_SPAWN=... does not look like ldbd" line on stderr at resolve-time; resolution then falls through to `shutil.which("ldbd")` and the sibling-of-ldb heuristic. The operator sees the actual failure mode 100ms in, not 3s in. Coupled daemon change: `ldbd --version` now prints "ldbd <version>" instead of just "<version>". The I5 probe greps for "ldbd" in the output; without this the probe rejects the real daemon. Matches `ldb-dap --version`'s convention and the `ldbd --help` first-line format. No tests pinned the old bare-semver output. Bundled cleanups: - N1: `_autospawn_daemon`'s docstring claimed stderr was inherited from the parent process. Wrong since phase-2; the daemon's stderr goes to /dev/null by default and to `$LDB_LDBD_LOG_FILE` when set. Doc text now matches the code. - N2: retry-loop comment said "200ms * 10 retries (~2s)" but the loop was `range(15)`. One-line factual fix to "200ms * 15 retries (~3s)." - N5: socket re-created inside the retry loop on each iteration. POSIX leaves a socket whose `connect()` failed in an unspecified state for further `connect()` calls; reusing it works on Linux and macOS today but is pedantically undefined. Fresh socket per iteration is one extra syscall per retry and removes the corner case. Test: `tests/smoke/test_socket_autospawn_validates_binary.py` pins `$LDB_LDBD_SPAWN=/bin/echo`, strips $PATH down to python+coreutils (no ldbd discoverable that way), runs `ldb --socket ... target.open` from a temp CWD outside the repo. Asserts the CLI succeeds via sibling fallback under 2.5s with the expected stderr diagnostic mentioning `LDB_LDBD_SPAWN` and "does not look like ldbd." TDD-verified red: pre-fix the test fails at 3.08s with the "never began accepting" message — confirms it pins the regression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The test docstring claimed it validates "concurrent dispatch" — the review correctly flagged this as overstated. `dispatch_mu_` serialises overlapping RPCs in phase-2, so two clients hitting the daemon at the same time queue at the dispatcher. What the test actually pins: - Accept-level concurrency. Two unix-socket connections held open simultaneously. The pre-phase-2 single-client accept loop would block worker B's connect() until worker A disconnected; the barrier between target.open and module.list would deadlock. - Per-connection target_id state persistence. Each worker opens its own target, both succeed, both find their target_id still alive on the second RPC. Docstring, in-test comment on the barrier, and success message all rewritten to match. CMake test name kept as `smoke_socket_multiclient` — accurate at the file level, churning history for naming-only churn isn't worth it. True per-connection dispatch parallelism is a phase-3 item (per-target dispatcher sharding); listed in `docs/35-field-report-followups.md`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`docs/35-field-report-followups.md`: - §2 phase-2 item 1 (Multi-subscriber notification sinks) rewritten to say "broadcast-to-all; per-target filtering happens at the client." Pre-fix the doc claimed "without cross-talk," which implied server-side target_id routing that doesn't exist in phase-2. The post-review C1 shared_ptr migration also recorded here so anyone reading the design doc sees the UAF fix in context. - "Phase 3 — carried forward" gains a new bullet for target_id- aware notification routing (the server-side filtering that phase-2 ducked). The existing "per-target dispatcher sharding" bullet reworded to call out the dispatch-parallelism dimension specifically: today two clients on independent target_ids still queue at `dispatch_mu_`. SBAPI cancellation and worker-list reaping items were already in the list and unchanged. `docs/WORKLOG.md`: new top entry summarising the phase-2 cleanup — the four pre-existing commits (`2e6f4ed` C1, `bad8f90` I2, `2978590` I3, `8c03765` I4+N3+N4) plus the new ones (`9397c03` I5+N1+N2+N5 with `ldbd --version` companion change and the new TDD-verified smoke test, `716689b` N6 test naming honesty, this commit). Decisions, surprises, and the verification stanza record the rationale for future-me / future agents. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

I1: find_string_xrefs's prior signature took no provenance — every ADRP-pair resolver diagnostic produced by the underlying xref_address scans (adrp_pair_skipped, adrp_pair_writeback_cleared, adrp_pair_cond_branch_recorded, adrp_pair_function_start_reset, adrp_pair_unresolvable_load, warnings) was silently dropped when an agent reached the resolver via string.xref instead of xref.addr. The agent then couldn't see "the heuristic skipped N loads on this binary" and had no signal to fall back to symbol-index correlate. Thread an optional XrefProvenance* through find_string_xrefs. Counters and warnings accumulate across every per-StringMatch xref_address invocation; the dispatcher attaches the aggregate to the string.xref response on the same emission policy as xref.addr (only when something fired). Phase-3 gate-7 warning emission moved to a baseline-delta scheme so sharing one provenance across N xref_address calls doesn't produce "skipped 0" duplicates — only the actual increment from each call generates a warning string. I2 (string.xref half): the dispatcher schema for string.xref now documents the same five counters + warnings array as xref.addr, each described as "aggregate across every underlying xref scan." xref.addr's schema was updated in commit ced9f17 (C1+C2) with the renamed adrp_pair_cond_branch_recorded counter and the three phase-4-added counters (cond_branch_recorded, function_start_reset, unresolvable_load). Backend interface: virtual signature change ripples through the GDB/MI stub (returns empty, no behaviour change) and every test mock backend's override (8 test files updated). New unit test pins the threaded signature works against the real fixture binary; identical-result invariant holds whether provenance is nullptr or supplied. ctest 89/89. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…anup tail) Bundle of the remaining items from the opus phase-4 review that the two cleanup agents got partway through before hitting rate limits: - I3: parse_last_hex_in_operands → lifted to xref_arm64_parsers as parse_branch_target. Picks the last comma-separated operand and parses hex from there, instead of "rightmost hex token in the whole operand string." Closes the tbz w0,#0x10,_far_label case where 0x10 (bit position) was being picked as a branch target. - I4: function_starts insert lifted above the !adrp_regs.empty() guard so the hint is recorded even when no ADRP is currently tracked. - I5: tests/smoke/test_xref_pcrel_literal.py comment now matches the fixture's actual assembly (a magic .quad rather than a pcrel_data reference); the test continues to validate the provenance counter bump. - N1: xref_condbranch.s rewritten to actually reproduce the cross-function-cbz + fall-through-ADRP-ADD pattern that the ced9f17 fix closes. The new fixture FAILS against pre-cleanup master and passes here. - N2: xref_stripped_fnleak.s comments updated to acknowledge that it exercises gate 1 (function_name_at) rather than gate 3 (function_starts) on Apple silicon, where LLDB synthesises ___lldb_unnamed_symbol_<addr>. Phase-5 follow-up captured. All 18 xref + chained-fixup tests pass. Build warning-clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

§2 phase 2: multi-client socket + per-connection notification routing, auto-spawn, in-flight RPC interruption via self-pipe, idle timeout, daemon.shutdown RPC, recursive_mutex dispatch serialisation. Plus post-review cleanup: shared_ptr-owned NotificationSinks (UAF fix), worker shutdown gate, SO_SNDTIMEO, atomic stderr lines, LDB_LDBD_SPAWN binary validation.

§6 phase 4: chained-fixup + ADRP coverage extension — conditional-branch boundary, fat_arch_64 triple-aware slice selection, stripped-binary function_starts backstop, PC-relative literal-load provenance, MOV from XZR/WZR explicit, BindInfo schema (deferred imports walk), real ARM64 C fixture. Plus post-review cleanup: cond-branch fall-through correctness, same-fn cbz no-poison, clobber-by-default destination register tracking (closes CSEL/LDP false-positive class), FAT picker triple match honored, string.xref provenance plumbing, parser hardening + adversarial fixture rewrites. # Conflicts: # docs/WORKLOG.md

CI on Ubuntu / Linux x86-64 + Linux arm64 had been failing since PR #20 merged. Two issues: 1. getpeereid() is BSD-only (also on macOS). glibc and musl don't ship it. Wrap the peer-cred retrieval in a #if __linux__ / else branch: on Linux, getsockopt(SO_PEERCRED) returns a struct ucred; on the BSDs, keep the existing getpeereid call. peer_gid is preserved on both branches for API parity with a single (void) cast to silence -Wunused-variable. 2. The two ::ftruncate(fd, 0) and ::pwrite(...) calls in acquire_lock are documented as best-effort (a failed pid stamp degrades the collision diagnostic but doesn't break exclusion). gcc's -Wunused-result, treated as an error in the warning-clean build, isn't silenced by a plain (void) cast — the standard workaround is `if (call() != 0) {}`. Use that. 98/98 ctest green on Darwin-arm64 post-fix; Linux build path now compiles cleanly via the new ifdef branch (verified by tracing through the SO_PEERCRED path, which is standard on every Linux since 2.6.17). Linux CI on merge will confirm. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Bundles PRs #20 + #21 — the full RE-engineer field report and its phase-3/phase-4 hardening cycle. Original 6-item report is closed; phase-5 work is enhancement scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

zachgenius and others added 29 commits May 16, 2026 18:56

zachgenius merged commit 285c77d into master May 16, 2026
0 of 4 checks passed

zachgenius deleted the release/phase-4 branch May 16, 2026 11:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase 4: §2 multi-client socket daemon + §6 ARM64e xref coverage extension + CI fix#21

Phase 4: §2 multi-client socket daemon + §6 ARM64e xref coverage extension + CI fix#21
zachgenius merged 29 commits into
masterfrom
release/phase-4

zachgenius commented May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zachgenius commented May 16, 2026

What landed

§2 phase 2 — socket daemon multi-client + supporting work

§6 phase 4 — ARM64e xref coverage extension

CI portability fix (final commit)

Constituent commits

Test plan

Deferred to phase 5 (documented in docs/35-field-report-followups.md)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Deferred to phase 5 (documented in `docs/35-field-report-followups.md`)