Harden IPC framing for perfect concurrent output; add stress test; upgrade toolchain#19
Harden IPC framing for perfect concurrent output; add stress test; upgrade toolchain#19limeytexan wants to merge 3 commits into
Conversation
eae5194 to
f2151d7
Compare
f2151d7 to
e76f97e
Compare
There was a problem hiding this comment.
Pull request overview
This PR hardens t3’s worker→parent IPC framing to prevent corrupted/dropped output under highly concurrent streaming, adds a regression stress test (plus a micro-benchmark harness), and refreshes docs + CI/tooling to match the new behavior.
Changes:
- Replace fixed-size IPC frames with length-prefixed variable frames and introduce a buffered frame reader in the parent to reduce syscall overhead and prevent framing corruption.
- Add a concurrency stress generator + stream verifier, wire it into
make test, and add optional throughput benchmarking helpers. - Update usage docs/man text, adjust CI gcc matrix behavior, and refresh Flox environment/toolchain metadata.
Reviewed changes
Copilot reviewed 13 out of 15 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
t3.c |
Reworks worker IPC framing + parent-side buffered frame parsing; updates help text/options. |
tests/stress.c |
Adds a highly concurrent stdout/stderr line generator for regression testing. |
tests/run-stress |
Adds a driver script that runs the stress generator under t3 and validates outputs. |
tests/check-stream.awk |
Adds an awk validator to detect dropped/garbled/duplicated/out-of-order lines. |
tests/bench.c |
Adds a fast stdout line generator for measuring per-line overhead. |
tests/bench-overhead |
Adds a timing harness to estimate marginal per-line overhead (manual target). |
tests/true.args |
Adds a simple true test invocation. |
tests/false.args |
Adds a simple false test invocation. |
t3.x |
Updates manpage include text to reflect long-line reassembly/capping behavior. |
README.md |
Fixes usage text and documents --outcolor/--errcolor flags. |
Makefile |
Adds stress (included in test) and bench targets; builds new test helpers. |
.gitignore |
Ignores built artifacts (t3, manpage, flox results) and compiled test helpers. |
.github/workflows/ci.yml |
Pins runners/actions, fixes gcc matrix installs/builds, and reduces fail-fast behavior. |
.flox/env/manifest.toml |
Upgrades schema/tooling and adds a Darwin SDKROOT hook for lint tooling. |
.flox/env/manifest.lock |
Updates locked toolchain/package versions to match the manifest changes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
2c119e0 to
9167384
Compare
9167384 to
902e670
Compare
902e670 to
e331f6a
Compare
Domain SummaryThe domain-relevant content of this PR is the Flox environment Description ConsistencyConsistent with the diff: the IPC fix (fixed 4 KB → length-prefixed Inconsistent: nothing material. Missing from the description (environment-level changes):
Overclaimed: nothing. If anything, the env-schema and Code ReviewThe C rewrite is high quality. Security and correctness were verified Important1. Stress test does not assert cross-stream interleaving order. 2. Makefile 3. Benchmark overstates what it isolates and is not self-reproducing. 4. Stress test has no timeout guard in CI. [.github/workflows/ci.yml:35-37, Minor5. 6. Single 7. 8. Per-line 9. Parent enqueues unboundedly into the in-memory list. [t3.c:855-888] 10. 11. 12. 13. 14. 15. Signed/unsigned in 16. 17. Oversized 64-byte handshake buffers plus a needless Confirmed correct (worth keeping)
VerdictThis is a high-quality PR with no Critical code-correctness findings. Via Forge (interactive) • 3ca4fd79 |
e331f6a to
a249593
Compare
|
Thanks — this is an excellent, careful review. Verified each claim against the tree; dispositions below. Addressed in Verdict items
Important
Minor
Thanks again for the depth here. |
a249593 to
0da5935
Compare
|
Correction on minor item #11 (and my earlier dismissal of it): I was wrong. Fixed in 0da5935: rather than pull a full Perl interpreter into the environment, added |
|
Reviewed the dispositions and verified each against Conceded pushbacks
Verified fixes (against
Nice catch on the Perl timer yourself — a host The by-design dispositions (per-line No remaining review blockers from my side. Via Forge (interactive) • 3ca4fd79 |
0da5935 to
3e82c90
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 15 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
t3.c:924
- Same as the stdout side: if the worker dies mid-frame and the remaining partial frame bytes are already buffered in
stderr_reader, the next poll can deliver POLLHUP without POLLIN and this branch closes the fd without warning. That makes a truncated final frame silent again in this edge case.
} else if (pfds[1].revents & POLLHUP) {
_debug(2, "closing stderr_msg_pipe[0]");
close(stderr_msg_pipe[0]);
pfds[1].fd = -1; // Ignore this file descriptor in future polls
num_open_fds--;
waitpid(stderr_worker, NULL, WNOHANG);
}
3e82c90 to
bd4577d
Compare
t3 timestamps stdout and stderr in separate worker processes that forward each line to the parent over a pipe. Under highly concurrent load (e.g. parallel builds) the original fixed-size struct framing could corrupt or drop lines, producing garbled output, and it shipped a full ~4 KB frame for every line regardless of length. Three framing bugs caused the corruption: - Partial writes resumed from `msg_payload + written`, which is pointer arithmetic on a `struct payload *` and so advanced by whole structs rather than bytes, corrupting any message split across writes. - The resend loop guard compared a signed -1 against a size_t, so the first short write was treated as complete and the rest of the message dropped. - The parent assumed a single read() returned a whole frame, but the frame exceeded PIPE_BUF, so a short read misaligned every subsequent frame. Replace the wire format with length-prefixed variable frames: a small header (timestamp + length) followed by exactly the line's bytes. Workers assemble each line in a growable, header-reserved buffer and send it with one write_full() that always transfers the whole frame; lines now grow up to a generous cap instead of being split at 4096 bytes. The parent reads through a per-pipe buffered reader that pulls a large chunk per read() and parses every whole frame out of it, carrying partial frames over - so the per-line syscall cost is amortized and a busy parent simply applies back-pressure rather than dropping or misaligning data. read_full() still backs the worker handshake; write_full() backs every send. For typical build output this roughly quarters t3's marginal per-line overhead versus the old fixed-size framing (measured by running the `make bench` harness against both versions); very long (multi-kilobyte) lines cost a little more but are now reproduced intact. Add a highly concurrent regression test (tests/stress.c, tests/run-stress, tests/check-stream.awk), wired into `make test`: many threads write uniquely identifiable lines to both streams through t3, and the checker verifies every line is reproduced exactly once, intact, in per-thread order, on the correct stream, across stdout, stderr, and the merged log. Add tests/bench.c and tests/bench-overhead behind a manual `make bench` target for overhead measurement. Documentation: - Correct the --help text (a run-together sentence and a blank --errcolor description) and document --outcolor. - Sync the usage block in README.md. - Replace the man page's note about a non-existent --nocolor option and the old 4096-byte line limit with the current long-line behavior. Tooling and CI: - Upgrade the Flox toolchain (gcc 15.2.0, clang-tools 21.1.8, gdb 17.1) and add a Darwin-only apple-sdk plus an on-activate hook that exports SDKROOT, so the clang-tidy in `make lint` can find the macOS SDK headers without changing the portable Makefile. - Pin both GitHub Actions to full-length commit SHAs as the org now requires. - Make the CI gcc matrix real: it declared 9/10/11 but always built with the default compiler, so install and pass CC=gcc-<ver> for each leg, pin the job to ubuntu-22.04 where those compilers are packaged, and set fail-fast: false. - Ignore build outputs, remove stray prototype/build files, and track the false/true test inputs alongside their committed expected output. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
bd4577d to
667ce68
Compare
| const size_t header_size = sizeof(struct msg_header); | ||
| size_t capacity = BUFFER_SIZE; | ||
| char *line = xmalloc(capacity); | ||
| size_t line_length = 0; // text bytes accumulated (excludes the header) | ||
| struct timespec timestamp = {0, 0}; |
| if (header_size + line_length + 1 > capacity) { | ||
| if (capacity >= MAX_LINE_SIZE) { | ||
| send_line(pipe_fd, line, line_length, ×tamp); | ||
| line_length = 0; | ||
| } else { | ||
| capacity *= 2; | ||
| if (capacity > MAX_LINE_SIZE) { | ||
| capacity = MAX_LINE_SIZE; | ||
| } | ||
| line = xrealloc(line, capacity); | ||
| } | ||
| } |
Summary
Fixes occasional garbled/dropped output when streaming highly concurrent command output through
t3, adds a stress test that reproduces the failure, and reworks the worker→parent IPC so it is both correct and substantially cheaper per line. Also refreshes the toolchain, fixes CI, and corrects the docs.Changes
write_full(), assembled in a growable buffer (so lines are no longer split at 4096 bytes), and read back through a per-pipe buffered reader that amortizes syscalls and applies back-pressure instead of losing data.make benchharness measures one binary's per-line cost; the fixed-vs-variable comparison was done by running it against a build of the previous version as well. Very long (multi-KB) lines cost slightly more than the old format but are now reproduced intact.make testregression test runs many threads writing identifiable lines throught3and verifies every line is reproduced exactly once, intact, in per-stream/per-thread order, on the correct stream. (Cross-stream interleaving in the merged log is ordered by capture timestamp and is not a deterministically assertable property for racing producers, so it is intentionally not asserted.)--help/README usage corrections, document--outcolor, and replace the man page's non-existent--nocolornote and stale 4096-byte limit with the current long-line behavior.flox upgrade(gcc 15, clang-tools 21, gdb 17) with a Darwin-onlyapple-sdk/SDKROOThook somake lintfinds the macOS SDK; addcoreutilssomake bench's timer uses GNUdatefrom the environment instead of a host fallback; pin GitHub Actions to commit SHAs; make the CI gcc matrix actually build with each version and bound the jobs withtimeout-minutes; ignore build outputs and remove stray files.Environment / Flox
These manifest changes ride along (partly from the bundled
flox upgrade/ publish chores) and are called out here since they are environment-level:version = 1→schema-version = "1.12.0"(and the[build]→[build.t3]table reformat). This may raise the minimum Flox CLI version needed to use the environment.[options] systems(the explicit 4-system list) is no longer present; system support now relies on flox defaults plus the Darwin-scopedapple-sdk.systems. The lock still resolves all four systems (aarch64/x86_64 × darwin/linux), so builds are unaffected, but the declaration is no longer explicit.flox buildnow stamps the artifact with agit describe-derived version and a description.Verification
Clean
makebuild,make test(incl. stress) passing on each gcc version,make lintexit 0, andflox buildpassing end-to-end. CI green.Release
Intended as the v1.1.0 release (minor bump from v1.0.9). Tag
v1.1.0onmainonce merged.🤖 Generated with Claude Code