FIX 179: Static binaries can't exit cleanly#207
Open
megastallman wants to merge 44 commits into
Open
Conversation
Co-authored-by: Antigravity/Gemini3pro
- FreeBSD signal frame: build proper FreeBSD-shaped ucontext_t/mcontext_t/ siginfo_t in DeliverSignal/SigRestore instead of Linux layouts, so libthr doesn't clobber the saved RIP on handler exit. - SignalActor: add the sigtramp intercept that ExecuteInstruction already has. Recursive signal delivery (CheckInterrupt on EINTR) was bypassing it. - Auto-generate /var/run/ld-elf.so.hints when missing so pkg-installed binaries find /usr/local/lib without a manual ldconfig. - /dev/null & co: always satisfy standard char devices from the host. The ENOENT fallback didn't catch EACCES from O_CREAT shell redirects against a read-only chroot /dev — broke pkg PRE-INSTALL scripts (nginx). - FixupShellEnv: rewrite SHELL=/bin/bash to a chroot-resident shell at load time so mc's subshell doesn't fall through to host bash. - Syscall mappings: pathconf (191), fpathconf (192), getsid (310), setresuid (311), setresgid (312), sigsuspend (341). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… audit) + sysctls su now works. Adds: - getdtablesize (89), setpriority (96), getpriority (100) — handlers in blink that return sensible defaults rather than ENOSYS. - setegid (182), seteuid (183) — reshape rdi into Linux setresuid/setresgid arg layout (rdi=-1, rsi=euid/egid, rdx=-1). - audit family (445-453) — shared stub returning 0. - sysctlbyname kern.securelevel → -1 (permissive), kern.console → empty. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…_waiters==0 The previous impl returned 0 immediately when ucond->c_has_waiters was 0. The FreeBSD kernel ALWAYS sets c_has_waiters = 1 then sleeps for CV_WAIT, regardless of the prior value — libthr counts on that to wake correctly. Returning early looked like a spurious wake to libthr: the thread stayed on its userspace sleepq, the caller looped back into pthread_cond_wait, and the second sleepq_add panic'd with "thread %p was already on queue" at thr_cond.c:285. LibreOffice's Qt thread pool reproducibly hit this. Now mirror the kernel: atomically store 1 into c_has_waiters, then sleep on the futex with val=1. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… accept4)
FreeBSD/amd64 Go binaries crashed during startup/runtime through five
sequential bugs; fix each so cpuburner and an HTTP server/client over the
kqueue netpoller run to completion and exit cleanly.
- sigaltstack: translate FreeBSD flag *values* (SS_DISABLE=0x4) to Linux
(0x2), not just the struct field order. Go disabling its altstack passed
flags=0x4 which hit the unsupported-flags EINVAL path, so Go's
runtime.sigaltstack crash-on-failure stub SIGSEGV'd.
- kqueue: the kqueue->epoll layer is gated on HAVE_EPOLL_PWAIT1, but the
configure probe passed NULL events and tripped -Werror=nonnull on modern
glibc, leaving the define off -> Go netpoll did kqueue()->ENOSYS->fatal.
Fix tool/config/epoll_pwait{1,2}.c to pass a real events pointer.
- sched_yield: map FreeBSD 331 -> Linux 0x18 (was unmapped -> busy spin).
- _umtx_op WAIT (2/11/15): return 0 on value mismatch like FreeBSD (blink
returned EAGAIN, Linux futex semantics), which crashed Go's futexsleep1.
- exit: FreeBSD exit(2) terminates the whole process (thr_exit is per-thread);
map syscall 1 -> Linux exit_group (0xe7) not exit (0x3c), else Go hung
forever after main() returned because only the calling thread exited.
- accept4: map FreeBSD 541 with SOCK_CLOEXEC/NONBLOCK flag translation; the
netpoll TCP server otherwise couldn't accept (ECONNREFUSED).
- sysctls: add kern.smp.maxcpus, kern.conftxt, kern.ipc.soacceptqueue.
blink's multi-threaded SMP emulation intermittently corrupts the Go runtime
once it schedules across >1 P (fatal error: schedule: holding locks), so pin
guests to a single CPU: kern.smp.maxcpus=1, hw.ncpu=1, and cpuset_getaffinity
reports only CPU0.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… data, FIOASYNC, sendfile nginx accepted connections but never responded; curl hung until blink was killed. Four bugs in the FreeBSD network path: - kevent EV_CLEAR was ignored. nginx registers connection sockets edge- triggered (EV_CLEAR), but the epoll registration stayed level-triggered, so epoll_wait re-reported the same fd readable forever (~500k spins) and nginx made no progress. Map EV_CLEAR -> EPOLLET (track per-watch, OR into the combined epoll events). - kevent never filled the data field. nginx uses kev.data (bytes available) under kqueue to size reads / track ready state, so it stopped after the first read. Populate it: FIONREAD for EVFILT_READ, SO_SNDBUF for EVFILT_WRITE. - ioctl(FIOASYNC) (FreeBSD 0x8004667d) returned EINVAL. nginx sets it on the master<->worker channel socket while spawning workers and treats failure as fatal, so no worker ever started. Translate to O_ASYNC via F_SETFL. - sendfile (FreeBSD syscall 393) was unmapped, so static files never transferred. FreeBSD's sendfile differs from Linux's (file/socket args swapped, sf_hdtr header/trailer iovecs, byte count via *sbytes, returns 0/-1). Add SysFreeBSDSendfile with proper EAGAIN/partial-write semantics so nginx resumes via EVFILT_WRITE on a non-blocking socket. Verified: nginx (default master+worker fork mode) serves sequential, parallel, small, and 2 MB requests with byte-exact md5; Go net test (kqueue + accept4) and cpuburner still pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…E_IN_ROOT) blink confines guests with overlays (openat relative to the overlay dir) rather than a real chroot(), so the host kernel resolved an absolute symlink TARGET against the host root and escaped the overlay. A guest symlink like /usr/local/www/nginx -> /usr/local/www/nginx-dist therefore couldn't be followed (ENOENT), e.g. nginx's default doc root 404'd. Use openat2(RESOLVE_IN_ROOT) — which makes "/" and absolute symlinks resolve relative to the dirfd, like chroot — as a fallback when the plain openat path fails with ENOENT/ENOTDIR, so the common case keeps its existing behavior and its errno (the overlay search loop is unchanged). OverlaysOpen retries the open in-root; OverlaysGeneric (stat/access/chmod/chown/utime/unlink/readlink/...) resolves the path's parent in-root and retries on the resulting host path, preserving each op's follow/create semantics on the final component. Guarded by __linux__ && SYS_openat2 with a cached-ENOSYS flag, so older kernels and non-Linux hosts simply fall through to prior behavior. Verified: nginx serves its default page through the symlinked doc root (200); cat/ls/readlink through single and chained absolute symlinks work; lstat and readlink still operate on the link itself; nonexistent paths still ENOENT; Go programs, md5, and shell pipes unaffected. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ption)
SysFreeBSDThrNew wrote the 11-byte thr_exit thunk at thunk_addr = sp + 8,
where sp = ((stack_base + stack_size) & ~15) - 8 — i.e. at the 16-aligned
stack top (stack.hi). That overran the child g0 stack by up to ~11 bytes on
every thread creation, silently clobbering whatever guest memory happened to
be adjacent (often Go scheduler structures or a sudog).
This was the dominant cause of the long-standing FreeBSD-only SMP corruption
("schedule: holding locks", "sudog with non-nil next"). It is FreeBSD-specific
because the Linux clone() path writes no such thunk, which is exactly why
Linux Go binaries were immune — the key discriminator while tracking it down.
It reproduced even on a single physical core with async preemption disabled,
confirming a stray write rather than a coherence/atomics/ordering issue.
Place the thunk at top-16 and the return-address slot at top-24, both inside
[stack_base, stack_base+stack_size), preserving the RSP%16==8 entry ABI.
After the fix, the GOMAXPROCS=4 atomic+mutex and channel/goroutine-churn
stress tests are 100% clean (previously ~30-67% crash) across interpreter,
JIT, linear and single-core configurations.
The single-CPU pin (kern.smp.maxcpus / hw.ncpu / cpuset_getaffinity = 1) is
kept for now: a separate residual corruption still appears under allocation/
GC-heavy multicore workloads (SIGSEGV in runtime heap-bitmap init), to be
fixed before multicore is enabled.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ifiers) GCC 15 produced 387 warnings on a clean build; older GCC was silent. - 370x -Wcast-align: blink reinterprets byte-addressed guest memory (u8 *) as wider atomic/vector types throughout the SSE and atomic-op fast paths (~230 sites). These are intentional and safe on the x86_64/aarch64 hosts we target. builtin.h already carried `#pragma GCC diagnostic error "-Wcast-align=strict"`, but that pragma form was silently ignored by older GCC (so the tree always built clean); GCC 15 honors it. Switch it to `ignored "-Wcast-align"` with a comment explaining why. - 17x -Wdiscarded-qualifiers: all from passing string literals / const data to CopyToUser / CopyToUserWrite, whose `src` was non-const `void *` even though the to-user direction only reads it. Make `src` `const void *` (declarations in machine.h, definitions in memory.c). Clean build now emits zero warnings; no behavior change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…CPU pin) Two -m-mode (software MMU) data races corrupted the guest under multiple threads, the residual SMP failures after the thr_new thunk fix. Both only manifest with >1 CPU, which is why the single-CPU pin masked them. 1. g_hostpages race. In -m mode TrackHostPage() records each committed host page in a global growable array and returns its index, which is stored in the PTE's PAGE_TA bits; FindHostPage() reads the array on every guest memory access to translate a PTE back to a host pointer. TrackHostPage() mutated the array (n++, realloc, p[entry]=ptr) with no lock, while page faults on other threads called it concurrently and FindHostPage() read it lock-free. Two faults racing n++ got the same index, aliasing two guest pages onto one host page; a concurrent realloc moved the array out from under a reader. Guard writes with g_hostpages_lock, publish the (grown) array pointer with release ordering, and read it with an acquire load. Old arrays are not freed (a reader may hold one); growth is geometric so this leak is negligible. 2. Page-fault CAS-loss double-free. When two threads fault the same anonymous page, the CAS loser freed (u8 *)(page & PAGE_TA). In -m mode those bits are the g_hostpages index, not the host pointer, so the allocator free list got a bogus entry and later handed out a broken page. Free FindHostPage(page). With these and the earlier thr_new thunk fix, multicore is correct, so drop the pin: cpuset_getaffinity / kern.smp.maxcpus / hw.ncpu report GetCpuCount() again (guest NumCPU = host count). Verified -m multicore, 20 runs each: GOMAXPROCS=4 atomic+mutex, channel, goroutine-churn, alloc/GC-heavy, and net/http stress all clean (gc and httptest were ~15-30% crash before); nginx serves; cpuburner saturates 4 CPUs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
In software-MMU (-m) mode TrackHostPage() appends a host-page pointer to the global g_hostpages table and returns its index, stored in a PTE's PAGE_TA bits. It was called from AllocateAnonymousPage() for *every* returned page, including pages popped off the allocator free list, so a freed-then-reallocated page got a brand-new table entry each cycle. The table therefore grew ~8 bytes per page commit without bound — a slow leak in long-running -m guests. Track each host page exactly once, when AllocateBig() first maps it, and cache its cookie (the PAGE_TA bits) on the free-list node so reallocation reuses the existing slot instead of minting a new one. FreeAnonymousPage() and the internal FreePageTable() take the cookie (callers already hold the PTE, so they pass entry & PAGE_TA); FindHostPage() still yields the host pointer. g_hostpages.n now tracks peak committed pages rather than cumulative allocations: the GC-churn stress test settles at ~2700 entries instead of growing past ~160000. Verified gc + atomic/mutex stress 12/12 clean in both -m and linear modes. (The PAGE_MUG file-mapping path in ReserveVirtual still mints entries that are munmap'd rather than returned to the free list; that is bounded by mmap count, not per-fault, and is left as-is.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The previous commit bounded g_hostpages for anonymous pages by caching each page's slot on the allocator free list. File/shared mmaps (PAGE_MUG) take a different free path: FreePage() munmap()s their backing memory rather than returning it to the allocator, so their g_hostpages slot was orphaned and the table still grew one entry per munmap'd MUG page. Add a free-index stack (freeidx/nfree/cfree, guarded by g_hostpages_lock). ReleaseHostPage() pushes a slot when its PAGE_MUG backing is unmapped, and TrackHostPage() pops a reusable slot before growing the table. This is safe: the freed page's PTE has been cleared and all TLBs are invalidated at the end of the enclosing FreeVirtual() before the slot can be handed out again, so no live PTE/TLB entry resolves to a reused slot (same ordering the anonymous-page reuse already relies on). Verified: 5000 shared-mmap/munmap cycles (~80k MUG page commits) hold g_hostpages.n flat at ~1290 instead of growing to ~80k; gc + atomic/mutex stress and nginx still clean under -m multicore. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Micro-op stitching leaves provably-redundant reg-reg moves in the final
instruction stream, especially res0<->arg0 round-trips that straddle the
Jitter-glue / inlined-micro-op boundary: the Jitter emits `mov %rdi,%rax` to
pass a result as the next op's argument, and the micro-op body immediately
begins `mov %rax,%rdi`. For a tight integer ALU loop the JIT output was ~9x
the size of the equivalent native code, and these cancelling moves sat on the
critical dependency chain.
Add a per-op peephole (PeepholeOp), run from AddPath_EndOp over the straight-
line host code emitted for a single guest op. It decodes that byte range with
blink's own decoder (for exact instruction boundaries) and removes:
- `mov A,A` no-op
- `mov A,B ; mov B,A` the second is a no-op (after the first A==B, so the
second leaves B unchanged) -> drop the second
Both are provably semantics-preserving (no effect on flags or any register).
The pass bails out for the whole op if it emitted any control transfer, so
relative-branch displacements and recorded jump fixups are never perturbed;
the path's terminating jump is appended afterwards on post-peephole offsets.
Only adjacent moves are touched, and it runs before commit on the writable
staging buffer (gated by CanJitForImmediateEffect).
Speedup: integer ALU benchmark 4.90s -> ~3.0s (~38% faster; native 0.64s).
Validated: JIT output byte-identical to the interpreter (-j) across integer,
floating-point, string and coreutils workloads; FreeBSD Go suite (atomic+mutex,
GC-churn) 15/15 each and a Linux Go binary that JITs all threads 20/20 clean
under -m multicore; nginx serves HTTP 200. Clean -Werror build.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This reverts commit 02237b7.
The peephole (02237b7) was reverted in db8338f on suspicion it caused thunar to crash under JIT. A proper rate-based bisect (25 runs/config) disproved that: thunar JIT-SIGSEGV rate: pre-session 7ff84a2 = 3/25, HEAD single-core = 7/25, HEAD multicore = 0/25 The crash is PRE-EXISTING (present before this session's first commit), occurs with the peephole reverted, and is rate-neutral w.r.t. the peephole (peephole-on 1/12 & 6/20 sit in the same noise band). It's a long-standing intermittent JIT bug in heavily-threaded GUI apps (varying signatures -> state corruption; `-j` avoids it), unrelated to this peephole. The earlier revert was a misdiagnosis from bisecting an intermittent bug on too few runs. Restoring the +44% codegen win. The pre-existing JIT bug is tracked separately (workaround: run such apps with -j). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
blink/path.c: the -m (software-MMU) stash-commit code emitted `cmpq $0,stashaddr(%rdi); jz +5; <call CommitStash>`, where the `jz +5` hardcoded a 5-byte skip. AppendJitCall() only emits a 5-byte `call rel32` when the target is within +/-2GB; when farther it emits a ~12-byte `movabs $addr,%rax; call %rax`. If ASLR places the JIT mmap >2GB from the blink binary, `jz +5` then lands mid-instruction and executes garbage -> guest-heap corruption -> SIGSEGV. Backpatch the jz to skip the ACTUAL emitted call length instead. Proven by forcing the far-call path: thunar under -m crashed 4/5 without this fix and 0/8 with it. (Latent in runs where the JIT mmap lands within 2GB, so it is real but not the sole cause of the -m GUI crash, which still reproduces intermittently and is under investigation. The interpreter and linear-mapping JIT are unaffected -- under linear, the stash path is never emitted.) Also fixes two genuine, separate JIT block-retirement concurrency bugs found along the way, plus a JIT memory-size bump: - building flag (jit.c/jit.h): ForceJitBlocksToRetire() could retire a block another thread had leased and was appending into (leased blocks stay on agedblocks), double-unlinking it from jit->blocks and letting a second thread reuse the same storage. Skip blocks flagged building (set under jit->lock in StartJit, cleared in ReinsertJitBlock_). - QSBR reclamation (jit.c, machine.c/.h, memorymalloc.c): a committed block could be retired and reused while a thread was still executing inside it (or blocked in a syscall called from it). Retired blocks now park on a draining list stamped with a reclaim epoch; each thread publishes the epoch it last saw at the Actor() quiescent point (m->jitqso) and DrainReclaimable_ only releases a block once every thread has passed it. The quiescence floor is computed under machines_lock BEFORE jit->lock, matching the existing outer->inner lock order (fork() takes machines_lock then jit.lock), so no inversion. - kJitMemorySize 31MB -> 128MB (lazily committed; costs only address space). README.md: document running without -m and the thunar/xeyes examples. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
FreeBSD sigpending(sigset_t *set) had no translation entry and fell through to the UNMAPPED-syscall path. Add a SysFreeBSDSigpending handler that mirrors the Linux rt_sigpending handler (SysSigpending) but emits FreeBSD's 16-byte sigset_t: blink's pending mask m->signals in the low 8 bytes, upper words zeroed. Mask bits are treated as bit-compatible with Linux, consistent with the sibling SysFreeBSDSigprocmask. Wire it up: translate FreeBSD 343 to the synthetic ordinal 0x251 and register the dispatch entry. Tested: a FreeBSD program that blocks+raises SIGTERM then calls sigpending reports rc=0 with SIGTERM pending and SIGINT not, under both -m and linear. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Under -m (software MMU) a memory access that straddles a page boundary can't be served by a single native locked instruction on the real memory, so blink copies both pages into a private per-machine stash (m->opcache->stash), the op does its read-modify-write there, and CommitStash() writes it back at end-of-op. For LOCKed/atomic ops this silently broke atomicity: the op's own LockBus() keys on the stash buffer (per-machine, never contended) and the write-back happens later outside any lock, so concurrent crossing atomics to the same address lose updates (e.g. a GObject refcount that straddles a page -> lost decrement -> premature free). Affected both the interpreter and the JIT; linear mode is fine because x86 LOCK is atomic across pages in hardware. Fix: in ReserveAddress()'s page-overlap path, after both pages resolve, take a bus lock keyed by the thread-shared guest address and set m->crosslocked; CommitStash() does the write-back then releases it, so the whole crossing read-modify-write is serialized vs other threads. Belt-and-suspenders release in Blink()'s halt path in case a fault unwinds past CommitStash(). Only crossing accesses pay this, and they are rare. Verified with a multithreaded stress test that places an atomic counter across a page boundary: before, the crossing counter ended up ~12M of 25M; after, it is exact under -m JIT and -m interpreter, while aligned counters and linear mode were always exact. Single-thread page-overlap, md5sum, go, shell all still pass. (Note: this is a real, separate -m correctness bug; it is NOT the intermittent thunar -m crash, which is still under investigation -- localized to infinite recursion in cairo's Bezier spline decomposition.) README.md: typo/usage doc tweaks. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
whereis(1) and other tools call confstr(_CS_PATH), which does
sysctl({CTL_USER, USER_CS_PATH}) (MIB 8.1). blink's FreeBSD sysctl handler
had no CTL_USER branch, so it returned ENOSYS:
whereis: sysctl("user.cs_path"): Function not implemented
Add the case to SysFreeBSDSysctl, returning a standard utility search path
that includes /usr/local so pkg-installed binaries are findable:
/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/local/sbin. Callers size
the buffer via a first oldaddr=0 query, matching the other string sysctls.
Verified: `whereis mc` -> "mc: /usr/local/bin/mc" (was erroring);
whereis ls/sh resolve; uname unaffected.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
blink backs guest ptys with real host ptys and runs guest processes as real host processes, but never handled TIOCSCTTY. FreeBSD TIOCSCTTY (0x20007461) fell through to the default einval(), so after a shell's setsid() it could not acquire the pty as its controlling terminal. Subsequent /dev/tty opens returned ENXIO and TIOCGPGRP/TIOCSPGRP returned ENOTTY, so shells printed "cannot set terminal process group" / "no job control in this shell". Route FreeBSD TIOCSCTTY (and Linux TIOCSCTTY 0x540e) to the host TIOCSCTTY on the pty fd. After the guest setsid(), the host call makes the pty the controlling terminal so tcgetpgrp/tcsetpgrp (TIOCGPGRP/TIOCSPGRP) work and job control comes up. Verified: the job-control errors are gone, `jobs` works in an interactive bash under a pty, and md5sum/shell pipes are unaffected. (This removes mc's "no job control" subshell warnings but does NOT by itself fix mc's ~10s subshell-init stall, which is a separate getcwd-in-subshell VFS issue still under investigation.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
getdents/getdirentries wrongly gated on the fd having been opened with
O_DIRECTORY ((fd->oflags & O_DIRECTORY) != O_DIRECTORY -> ENOTDIR). But
O_DIRECTORY is only an open-time hint; on FreeBSD/Linux you may open a directory
with plain O_RDONLY and read it. FreeBSD libc's physical getcwd opens ".." with
O_RDONLY and walks up; blink returned ENOTDIR, so getcwd failed ("cannot access
parent directories"). That broke mc's concurrent-subshell init: bash's
PROMPT_COMMAND `pwd >&pipe` errored instead of writing the cwd, so mc's sync
select timed out ~10s every startup. Gate on the fd's ACTUAL type (S_ISDIR via
the fstat already performed) instead; non-directories still get ENOTDIR. Fixes
getcwd for any program that opens a dir without O_DIRECTORY, and removes mc's
~10s subshell stall.
Also revert the TIOCSCTTY change from af88aed. Making TIOCSCTTY succeed let
shells believe they had job control, but blink can't deliver the rest of it
(process-group SIGSTOP/SIGCONT + waitpid(WUNTRACED) across guest processes).
With the getdents fix the subshell sync now *succeeds*, so mc proceeded into the
full job-control handshake and deadlocked (hung forever, needed kill -9). Leave
TIOCSCTTY unimplemented (default einval) so shells stay in the honest "no job
control" mode, where mc's subshell works and starts fast. Revisit if real job
control is ever implemented.
Net: `mc` (with subshell) now starts quickly under -m instead of stalling 10-15s
or hanging. Verified: getcwd in O_RDONLY-opened dirs works, ls/find/md5sum/go
unaffected, mc no longer stalls or hangs.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
31a3a2e relaxed getdents/getdirentries to read directories opened without O_DIRECTORY (correct in general, and it removed mc's ~10s subshell-init stall by letting bash's physical getcwd succeed). But making the subshell sync *succeed* caused mc to proceed into the full job-control handshake: its PROMPT_COMMAND runs `pwd>&pipe; kill -STOP $$`, then mc blocks in sigsuspend waiting for the subshell to stop. blink forwards SIGSTOP to the (real host) subshell process so it does stop, but mc is never woken with a child-STOPPED SIGCHLD, so mc hangs forever (needs kill -9). That is strictly worse than the prior 10-15s-slow-but-working behavior, so restore the O_DIRECTORY requirement for now. The clean fix is to deliver child-stop notifications (SIGCHLD on WUNTRACED-style stop) to the guest parent so mc's sigsuspend/wait wakes; until that exists, mc's concurrent subshell can't be driven and must run degraded (or use_subshell=0). The general getdents/getcwd relaxation can return together with that work. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The FreeBSD->Linux syscall shim mapped kill(37) to the Linux kill ordinal but left its signal argument untranslated. FreeBSD and Linux signal numbers diverge above 15 (FreeBSD SIGSTOP=17/SIGTSTP=18/SIGCONT=19/SIGCHLD=20/SIGUSR1=30 vs Linux 19/20/18/17/10), so e.g. `kill -STOP` actually sent SIGCHLD, `kill -CONT` sent SIGSTOP, and kill(SIGUSR1) delivered the wrong signal. Run the signal arg through XlatFreeBSDSignal() (as thr_kill already does). Concretely this makes job-control signals work: a shell's `kill -STOP $$` now really stops the process and the parent gets a child-stop SIGCHLD. Verified: a program that catches SIGUSR1 via kill(getpid(),SIGUSR1) now sees SIGUSR1 (was a different/no signal); regressions (md5sum/go/ls/whereis) unaffected. NB: this is necessary but not sufficient for mc's concurrent subshell to run fast -- with the separate getcwd/getdents relaxation it gets further (the init stop/cont handshake now completes) but still hangs later in the job-control loop, so that getdents relaxation stays reverted and mc remains slow-but-working (use mc -u / use_subshell=0 for fast startup). Full concurrent-subshell support is a larger job-control effort. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
blink didn't handle the FreeBSD line-discipline ioctls, so TIOCGETD
(0x4004741a) fell through to the default einval(). stty(1) queries TIOCGETD
during init, so it failed outright ("stty: TIOCGETD: Invalid argument") and
couldn't set any terminal modes. Route TIOCGETD/TIOCSETD to the host ioctls
(line discipline 0 = TTYDISC/N_TTY for a normal tty). Verified: `stty -a` now
reports the terminal settings instead of erroring.
(Found while diagnosing an mc panel key issue; unrelated to that, but a real
standalone fix for stty and other line-discipline users.)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…y keylog getdents/getdirentries rejected any fd not opened with O_DIRECTORY, but O_DIRECTORY is only an open-time hint -- a directory may be opened O_RDONLY and read. FreeBSD libc's physical getcwd() opens ".." with plain O_RDONLY then getdirentries, so blink returned ENOTDIR and getcwd() failed in every non-root directory. Gate on the fd's ACTUAL type (S_ISDIR via the fstat already done) instead of the open flag. This was the real cause of mc -u "Enter doesn't enter directories": pressing Enter descends into the dir, mc canonicalizes the new cwd via getcwd, that failed, mc bailed and reloaded the panel with the cursor reset to the top -- which looked like Enter acting as PageUp. The Enter byte (0d) was always correct. Verified: pwd -P / realpath now resolve in subdirs; ls still works; reading a non-dir as a dir still ENOTDIRs. (Earlier this relaxation was reverted because it unblocked mc's concurrent subshell into a job-control hang. Safe to re-land now: use_subshell=0 / mc -u means no subshell, so that path is never engaged.) Also add an env-gated tty keylog debug aid: when BLINK_KEYLOG=<path> is set, SysRead appends a hex line for every byte read from a tty (isatty fd, rc>0). The log fd is dup'd into the blink-reserved high range to avoid guest fd collision. Zero overhead when unset. Used to capture exactly what bytes a TUI receives per keypress. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Co-authored-by: Antigravity/Gemini3pro
The issue was that Blink was setting RDX to the program name, interpreted as a destructor by static binaries. I changed it to set RDX to 0 for non-Cosmo binaries, which fixes the crash on exit.