Skip to content

Harden IPC framing for perfect concurrent output; add stress test; upgrade toolchain#19

Open
limeytexan wants to merge 3 commits into
mainfrom
code-review-hardening
Open

Harden IPC framing for perfect concurrent output; add stress test; upgrade toolchain#19
limeytexan wants to merge 3 commits into
mainfrom
code-review-hardening

Conversation

@limeytexan

@limeytexan limeytexan commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes occasional garbled/dropped output when streaming highly concurrent command output through t3, adds a stress test that reproduces the failure, and reworks the worker→parent IPC so it is both correct and substantially cheaper per line. Also refreshes the toolchain, fixes CI, and corrects the docs.

Changes

  • IPC framing — The stdout/stderr worker processes forward each line to the parent over a pipe. The old fixed-size frame could corrupt or drop lines under load (struct-sized pointer arithmetic on partial writes, a signed/unsigned comparison that dropped messages, and whole-frame read assumptions) and shipped ~4 KB per line regardless of length. Replaced with length-prefixed variable frames sent via write_full(), assembled in a growable buffer (so lines are no longer split at 4096 bytes), and read back through a per-pipe buffered reader that amortizes syscalls and applies back-pressure instead of losing data.
  • Performance — For typical build output this roughly quarters t3's marginal per-line overhead versus the old fixed-size framing. The make bench harness measures one binary's per-line cost; the fixed-vs-variable comparison was done by running it against a build of the previous version as well. Very long (multi-KB) lines cost slightly more than the old format but are now reproduced intact.
  • Stress test — New make test regression test runs many threads writing identifiable lines through t3 and verifies every line is reproduced exactly once, intact, in per-stream/per-thread order, on the correct stream. (Cross-stream interleaving in the merged log is ordered by capture timestamp and is not a deterministically assertable property for racing producers, so it is intentionally not asserted.)
  • Docs--help/README usage corrections, document --outcolor, and replace the man page's non-existent --nocolor note and stale 4096-byte limit with the current long-line behavior.
  • Tooling/CIflox upgrade (gcc 15, clang-tools 21, gdb 17) with a Darwin-only apple-sdk/SDKROOT hook so make lint finds the macOS SDK; add coreutils so make bench's timer uses GNU date from the environment instead of a host fallback; pin GitHub Actions to commit SHAs; make the CI gcc matrix actually build with each version and bound the jobs with timeout-minutes; ignore build outputs and remove stray files.

Environment / Flox

These manifest changes ride along (partly from the bundled flox upgrade / publish chores) and are called out here since they are environment-level:

  • Schema migration version = 1schema-version = "1.12.0" (and the [build][build.t3] table reformat). This may raise the minimum Flox CLI version needed to use the environment.
  • [options] systems (the explicit 4-system list) is no longer present; system support now relies on flox defaults plus the Darwin-scoped apple-sdk.systems. The lock still resolves all four systems (aarch64/x86_64 × darwin/linux), so builds are unaffected, but the declaration is no longer explicit.
  • Build metadata: flox build now stamps the artifact with a git describe-derived version and a description.

Verification

Clean make build, make test (incl. stress) passing on each gcc version, make lint exit 0, and flox build passing end-to-end. CI green.

Release

Intended as the v1.1.0 release (minor bump from v1.0.9). Tag v1.1.0 on main once merged.

🤖 Generated with Claude Code

@limeytexan limeytexan force-pushed the code-review-hardening branch 2 times, most recently from eae5194 to f2151d7 Compare June 13, 2026 13:35

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens t3’s worker→parent IPC framing to prevent corrupted/dropped output under highly concurrent streaming, adds a regression stress test (plus a micro-benchmark harness), and refreshes docs + CI/tooling to match the new behavior.

Changes:

  • Replace fixed-size IPC frames with length-prefixed variable frames and introduce a buffered frame reader in the parent to reduce syscall overhead and prevent framing corruption.
  • Add a concurrency stress generator + stream verifier, wire it into make test, and add optional throughput benchmarking helpers.
  • Update usage docs/man text, adjust CI gcc matrix behavior, and refresh Flox environment/toolchain metadata.

Reviewed changes

Copilot reviewed 13 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
t3.c Reworks worker IPC framing + parent-side buffered frame parsing; updates help text/options.
tests/stress.c Adds a highly concurrent stdout/stderr line generator for regression testing.
tests/run-stress Adds a driver script that runs the stress generator under t3 and validates outputs.
tests/check-stream.awk Adds an awk validator to detect dropped/garbled/duplicated/out-of-order lines.
tests/bench.c Adds a fast stdout line generator for measuring per-line overhead.
tests/bench-overhead Adds a timing harness to estimate marginal per-line overhead (manual target).
tests/true.args Adds a simple true test invocation.
tests/false.args Adds a simple false test invocation.
t3.x Updates manpage include text to reflect long-line reassembly/capping behavior.
README.md Fixes usage text and documents --outcolor/--errcolor flags.
Makefile Adds stress (included in test) and bench targets; builds new test helpers.
.gitignore Ignores built artifacts (t3, manpage, flox results) and compiled test helpers.
.github/workflows/ci.yml Pins runners/actions, fixes gcc matrix installs/builds, and reduces fail-fast behavior.
.flox/env/manifest.toml Upgrades schema/tooling and adds a Darwin SDKROOT hook for lint tooling.
.flox/env/manifest.lock Updates locked toolchain/package versions to match the manifest changes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread t3.c
Comment thread t3.c
Comment thread tests/run-stress Outdated
Comment thread tests/stress.c Outdated
@limeytexan limeytexan force-pushed the code-review-hardening branch 2 times, most recently from 2c119e0 to 9167384 Compare June 14, 2026 07:21
@limeytexan limeytexan requested a review from Copilot June 14, 2026 14:35

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 15 changed files in this pull request and generated 4 comments.

Comment thread t3.c
Comment thread t3.c
Comment thread t3.c
Comment thread tests/bench.c

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 15 changed files in this pull request and generated 2 comments.

Comment thread tests/stress.c Outdated
Comment thread t3.c Outdated

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 15 changed files in this pull request and generated no new comments.

@limeytexan

Copy link
Copy Markdown
Contributor Author

Domain Summary

The domain-relevant content of this PR is the Flox environment
definition, not the C application that happens to live in the repo.
The PR's stated work — hardening IPC framing for correct concurrent
output, adding a stress harness, and refreshing the toolchain/CI — is
accurately reflected in the diff. The changes a Flox reviewer should
weigh are environment-level and under-described.

Description Consistency

Consistent with the diff: the IPC fix (fixed 4 KB → length-prefixed
variable frames via a write-everything helper plus a buffered
back-pressure reader), the named old-code hazards (partial-write pointer
arithmetic, signed/unsigned comparison, whole-frame read assumptions),
the stress-test guarantee, the doc corrections, and the tooling/CI
changes (catalog upgrade gcc15/clang-tools21/gdb17, Darwin-only Apple
SDK + SDKROOT hook for make lint, action SHA pinning, gcc matrix) all
match the diff.

Inconsistent: nothing material.

Missing from the description (environment-level changes):

  • The manifest schema migration version=1schema-version="1.12.0"
    [.flox/env/manifest.toml] — may raise the minimum Flox CLI version.
  • Removal of top-level options.systems (all four systems) in favor of
    per-package system scoping on the Apple SDK — may narrow declared
    system support.
  • The build now stamps the artifact with a git describe-derived
    version + description — a real change to what flox build produces.

Overclaimed: nothing. If anything, the env-schema and
build-metadata changes are undersold.

Code Review

The C rewrite is high quality. Security and correctness were verified
safe: handshake bounds, the 16 MiB frame-length cap preventing huge
allocations, padding zeroed before send, full pipe drain at EOF, EINTR
retries throughout, partial read/write cursors, a clean FD-leak audit,
correct flexible-array-member sizing, and EOF-vs-truncation distinguished
via errno=0. No Critical code-correctness findings.

Important

1. Stress test does not assert cross-stream interleaving order.
[tests/check-stream.awk:50-53, tests/run-stress:32-42]
The harness validates no loss, no duplication, and no garbling, but it
does not verify the "perfect concurrent output" interleaving order the
PR implies. This is the headline gap. It is not a code defect — either
narrow the claim to "no loss/dup/garble" or add an interleaving-order
assertion so the stated guarantee is actually tested.

2. Makefile test: target ordering is fragile, especially under
-j.
[Makefile:135-136, 159, 170]
Two test: targets exist, and the rm -rf TESTTMPDIR cleanup recipe is
split across stanzas. Under parallel make this ordering is not
guaranteed and can race. Consolidate to a single test: target with a
well-ordered cleanup.

3. Benchmark overstates what it isolates and is not self-reproducing.
[tests/bench-overhead:8-10, 48, 50-53, t3.c:62]
The single-stream run with MESSAGE_HOLD_MS=100 measures the cheapest
batched-drain path, not steady-state per-line overhead, yet the header
oversells it. The benchmark only runs the new binary, so the
"~three-quarters cheaper" fixed-vs-variable framing claim is not
reproducible from make bench alone. Either compare both framings or
soften the header.

4. Stress test has no timeout guard in CI. [.github/workflows/ci.yml:35-37,
Makefile:159]
The stress run executes on every CI matrix leg, and the blocking
write_full means a hang would run to GitHub's job cap. Wrap the stress
invocation in a timeout.

Minor

5. framereader_fill doubles the buffer unboundedly at fill time.
[t3.c:464-489, 502]
The growth looks unbounded, but the parse-time MAX_LINE_SIZE check in
framereader_next caps it; the buffer simply never shrinks after a huge
frame. With a single trusted writer this is defensive only. Add a
cross-referencing comment pointing to the parse-time cap, and optionally
a defensive shrink/cap. (Raised by all three code lenses.)

6. Single read() per POLLIN iteration leaves data unserviced
until the next poll.
[t3.c:850-871]
Correct as written for fairness across streams. Add a comment so it is
not later "optimized" into a blocking drain.

7. timespec_ms_delta can overflow and narrows longint.
[t3.c:351-355]
The long * 1000 arithmetic can overflow and the result is truncated to
int. Benign at expected magnitudes; widen the type for safety.

8. Per-line fflush is the dominant per-line syscall cost.
[t3.c:576]
Intentional for interactivity, but worth noting this — not framing — is
the real per-line performance lever.

9. Parent enqueues unboundedly into the in-memory list. [t3.c:855-888]
Back-pressure protects the wire and the workers but not parent RAM. The
100 ms hold self-limits this in practice, but an adversarial
fast-producer / slow-consumer (slow terminal) could balloon memory.

10. make bench is undocumented in the README/man page given the
performance claims attached to it.

11. bench-overhead hard-codes Perl (Time::HiRes), an undeclared
toolchain dependency.
[tests/bench-overhead:26]

12. echo -n in the Makefile is non-portable; prefer printf.
[Makefile:61] (pre-existing)

13. TESTTMPDIR is created at parse time on every make run, but only
the test target cleans it.
[Makefile:53]

14. waitpid(WNOHANG) worker reap can leave transient zombies.
[t3.c:877, 906, 991]
Workers are not reaped after the main loop exits.

15. Signed/unsigned in snprintf truncation checks. [t3.c:544, 560]
Already NOLINT'd; noted for completeness.

16. .gitignore lists tests/midline-flush, which builds via the
implicit rule without $(CFLAGS)
— inconsistent with how stress/bench
build. [.gitignore:58, Makefile:143]

17. Oversized 64-byte handshake buffers plus a needless
snprintf/strcmp in await_worker.
[t3.c:393-394, 415]

Confirmed correct (worth keeping)

  • write_full/read_full EINTR + partial-IO handling.
  • memset of the header before send (no uninitialized padding leak).
  • stress.c PIPE_BUF-atomicity reasoning and fatal-on-short-write.
  • The old signed/unsigned comparison bug is fully eliminated.
  • CI action SHA pinning.
  • check-stream.awk correctly keeps its .awk suffix (it is a source
    file processed by awk -f, not an executable entrypoint);
    run-stress and bench-overhead are correctly unsuffixed. Compliant
    with the no-language-suffix-for-entrypoints convention.

Verdict

This is a high-quality PR with no Critical code-correctness findings.
The two things to weigh before merge are the test-claim gap (the stress
harness checks loss/dup/garble but not the interleaving-order guarantee
the description implies) and the undocumented environment-schema
migration (version=1schema-version="1.12.0" plus the removal of
top-level options.systems). Resolving the claim wording and surfacing
the env-schema change in the description would clear the path to merge.


Via Forge (interactive) • 3ca4fd79

@limeytexan limeytexan force-pushed the code-review-hardening branch from e331f6a to a249593 Compare June 14, 2026 18:51
@limeytexan

Copy link
Copy Markdown
Contributor Author

Thanks — this is an excellent, careful review. Verified each claim against the tree; dispositions below. Addressed in a249593, plus PR-description updates.

Verdict items

  • Env-schema migration — Good catch; it was undescribed. The PR now has an Environment / Flox section calling out the version=1schema-version="1.12.0" migration, the dropped explicit [options] systems, and the new git describe build-metadata. On the systems point specifically: the lock still resolves all four systems (aarch64/x86_64 × darwin/linux), so builds are unaffected, but the explicit declaration is gone — documented rather than restored, by maintainer preference.
  • Interleaving claim (feat: provide support for all tee options #1) — The description now scopes the stress guarantee to per-stream/per-thread order and explicitly notes that cross-stream interleaving in the merged log is ordered by capture timestamp and is not a deterministically assertable property for racing producers, so it isn't asserted. The harness verifies what is deterministic (completeness, uniqueness, integrity, per-stream monotonic order).

Important

  • chore: remove outdated TODO comment and fix fibonacci test #2 (Makefile test: race) — Respectfully, not a race: there is a single recipe for test: (rm -rf $(TESTTMPDIR)); GNU make runs all prerequisites — even under -j — before that recipe, so cleanup is correctly ordered after every test. Left as-is.
  • chore: add lint target #3 (benchmark overstated / not self-reproducing) — Agreed. Softened the claim in the commit message and PR description (the fixed-vs-variable comparison requires building the previous version; make bench measures one binary), and rewrote the tests/bench-overhead header to say it measures the whole per-line cost — including the per-line fflush — of one binary, in batched-drain steady state.
  • fix: replace sprintf with snprintf for improved security #4 (no CI timeout) — Fixed: timeout-minutes: 15 on both CI jobs.

Minor

Thanks again for the depth here.

@limeytexan limeytexan force-pushed the code-review-hardening branch from a249593 to 0da5935 Compare June 14, 2026 19:13
@limeytexan

Copy link
Copy Markdown
Contributor Author

Correction on minor item #11 (and my earlier dismissal of it): I was wrong. tests/bench-overhead used perl -MTime::HiRes, and although perl is in the dev-env closure (via help2man) it is not exposed on PATH, so the script was silently falling back to the host /usr/bin/perl — exactly the kind of out-of-environment tool use the new build sandbox is meant to surface. My "left undeclared deliberately" rationale was doubly wrong (it doesn't affect the runtime closure either, since runtime-packages = []).

Fixed in 0da5935: rather than pull a full Perl interpreter into the environment, added coreutils (already in the closure) and switched the timer to date +%s.%N. GNU date now comes from the environment on all systems (the env provides $FLOX_ENV/bin/date); the macOS/BSD date that lacks %N is no longer relied upon. Verified make bench, the full suite, lint, and a pure-sandbox flox build all pass.

@limeytexan

Copy link
Copy Markdown
Contributor Author

Reviewed the dispositions and verified each against 0da5935. Everything lands — and on the two points where you pushed back, you're right; our reviewers overreached on both.

Conceded pushbacks

  • Makefile test: ordering (Important item 2) — Not a race. test: has a single recipe (rm -rf $(TESTTMPDIR)); GNU make brings all prerequisites up to date before running a target's recipe, even under -j, so cleanup is always ordered after stress and the per-test targets. Withdrawn.
  • midline-flush $(CFLAGS) (Minor item 16) — The built-in %: %.c implicit rule applies $(CFLAGS), so it builds with -Wall -g -DVERSION=... like the others. No inconsistency. Withdrawn.

Verified fixes (against 0da5935)

  • Stress claim scoped to per-stream/per-thread order; cross-stream interleaving documented as non-deterministic and intentionally unasserted. Good resolution.
  • Benchmark header rewritten and the comparison claim softened.
  • timeout-minutes: 15 on both CI jobs.
  • Cross-ref comments at t3.c:48 and t3.c:853.
  • timespec_ms_delta now returns long (no narrowing).
  • Timer switched to date +%s.%N with coreutils in the manifest.
  • printf replaces echo -n.
  • Blocking waitpid(..., 0) for both workers after the loop (t3.c:996-997).
  • Env-schema migration, dropped [options] systems, and the git describe build metadata are now documented in the description.

Nice catch on the Perl timer yourself — a host /usr/bin/perl fallback for an in-env tool is exactly what the build sandbox is meant to surface.

The by-design dispositions (per-line fflush, parent-queue bound, the pre-existing/NOLINT items) are all reasonable. The unbounded parent queue under a slow consumer is the only residual worth a future note, but a real fix needs worker-side back-pressure — correctly out of scope here.

No remaining review blockers from my side.


Via Forge (interactive) • 3ca4fd79

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 15 changed files in this pull request and generated 1 comment.

Comment thread t3.c

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 15 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

t3.c:924

  • Same as the stdout side: if the worker dies mid-frame and the remaining partial frame bytes are already buffered in stderr_reader, the next poll can deliver POLLHUP without POLLIN and this branch closes the fd without warning. That makes a truncated final frame silent again in this edge case.
        } else if (pfds[1].revents & POLLHUP) {
          _debug(2, "closing stderr_msg_pipe[0]");
          close(stderr_msg_pipe[0]);
          pfds[1].fd = -1; // Ignore this file descriptor in future polls
          num_open_fds--;
          waitpid(stderr_worker, NULL, WNOHANG);
        }

Comment thread t3.c

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 15 changed files in this pull request and generated 2 comments.

Comment thread Makefile Outdated
Comment thread t3.x Outdated

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 15 changed files in this pull request and generated 2 comments.

Comment thread t3.c
Comment thread t3.c

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 15 changed files in this pull request and generated 3 comments.

Comment thread tests/bench.c Outdated
Comment thread tests/bench.c
Comment thread tests/stress.c Outdated
t3 timestamps stdout and stderr in separate worker processes that forward
each line to the parent over a pipe. Under highly concurrent load (e.g.
parallel builds) the original fixed-size struct framing could corrupt or
drop lines, producing garbled output, and it shipped a full ~4 KB frame for
every line regardless of length. Three framing bugs caused the corruption:

- Partial writes resumed from `msg_payload + written`, which is pointer
  arithmetic on a `struct payload *` and so advanced by whole structs rather
  than bytes, corrupting any message split across writes.
- The resend loop guard compared a signed -1 against a size_t, so the first
  short write was treated as complete and the rest of the message dropped.
- The parent assumed a single read() returned a whole frame, but the frame
  exceeded PIPE_BUF, so a short read misaligned every subsequent frame.

Replace the wire format with length-prefixed variable frames: a small header
(timestamp + length) followed by exactly the line's bytes. Workers assemble
each line in a growable, header-reserved buffer and send it with one
write_full() that always transfers the whole frame; lines now grow up to a
generous cap instead of being split at 4096 bytes. The parent reads through a
per-pipe buffered reader that pulls a large chunk per read() and parses every
whole frame out of it, carrying partial frames over - so the per-line syscall
cost is amortized and a busy parent simply applies back-pressure rather than
dropping or misaligning data. read_full() still backs the worker handshake;
write_full() backs every send.

For typical build output this roughly quarters t3's marginal per-line overhead
versus the old fixed-size framing (measured by running the `make bench` harness
against both versions); very long (multi-kilobyte) lines cost a little more but
are now reproduced intact.

Add a highly concurrent regression test (tests/stress.c, tests/run-stress,
tests/check-stream.awk), wired into `make test`: many threads write uniquely
identifiable lines to both streams through t3, and the checker verifies every
line is reproduced exactly once, intact, in per-thread order, on the correct
stream, across stdout, stderr, and the merged log. Add tests/bench.c and
tests/bench-overhead behind a manual `make bench` target for overhead
measurement.

Documentation:
- Correct the --help text (a run-together sentence and a blank --errcolor
  description) and document --outcolor.
- Sync the usage block in README.md.
- Replace the man page's note about a non-existent --nocolor option and the
  old 4096-byte line limit with the current long-line behavior.

Tooling and CI:
- Upgrade the Flox toolchain (gcc 15.2.0, clang-tools 21.1.8, gdb 17.1) and
  add a Darwin-only apple-sdk plus an on-activate hook that exports SDKROOT,
  so the clang-tidy in `make lint` can find the macOS SDK headers without
  changing the portable Makefile.
- Pin both GitHub Actions to full-length commit SHAs as the org now requires.
- Make the CI gcc matrix real: it declared 9/10/11 but always built with the
  default compiler, so install and pass CC=gcc-<ver> for each leg, pin the
  job to ubuntu-22.04 where those compilers are packaged, and set
  fail-fast: false.
- Ignore build outputs, remove stray prototype/build files, and track the
  false/true test inputs alongside their committed expected output.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 15 changed files in this pull request and generated no new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants