Phase 2 of the Tracker quiescence-handshake optimization. Phase 1 (the waiting-flag gate) takes per-cycle futex_wake from O(N) to O(1) by only notifying when the WaitSet thread is actually parked. This issue tracks the deferred phase 2: a bounded spin in wait_for_quiescence before parking, so that in the common case (jobs finish in microseconds) the dispatch thread never parks at all — waiting stays false, no futex_wait, no futex_wake, zero syscalls per cycle.
This was deliberately split out of phase 1 because it is the only piece carrying real-time scheduling risk and its entire justification is empirical. It must clear the acceptance criteria below before merging.
Goal
Eliminate the residual O(1) futex_wait (waiter parks) + futex_wake (last completer notifies) per cycle by spinning briefly on the submitted/completed counters before taking the lock and parking.
Acceptance criteria
Context
- Phase 1 handshake:
crates/taktora-executor/src/pool.rs (Tracker::complete / wait_for_quiescence).
- The two-hazard ordering rationale (Dekker
SeqCst flag-decision race + lock-protected park handshake) is documented in-code and enforced by the loom model — the spin must preserve both.
Phase 2 of the
Trackerquiescence-handshake optimization. Phase 1 (thewaiting-flag gate) takes per-cyclefutex_wakefrom O(N) to O(1) by only notifying when the WaitSet thread is actually parked. This issue tracks the deferred phase 2: a bounded spin inwait_for_quiescencebefore parking, so that in the common case (jobs finish in microseconds) the dispatch thread never parks at all —waitingstays false, nofutex_wait, nofutex_wake, zero syscalls per cycle.This was deliberately split out of phase 1 because it is the only piece carrying real-time scheduling risk and its entire justification is empirical. It must clear the acceptance criteria below before merging.
Goal
Eliminate the residual O(1)
futex_wait(waiter parks) +futex_wake(last completer notifies) per cycle by spinning briefly on thesubmitted/completedcounters before taking the lock and parking.Acceptance criteria
CycleObservationhistogram, A/B against phase-1main. No hardware proof → no merge. (This is the Pi A/B that was declined for phase 1; for the spin it is mandatory, because the spin trades CPU for syscalls and its only justification is the measured result.)SCHED_FIFOworker: affinity and FIFO priority are caller-configurable percrates/taktora-executor/src/thread_attrs.rs, and a spinning higher-or-equal-priority thread on a shared core can prevent the very worker it is waiting on from being scheduled. Gate the spin on core topology (or skip it entirely when topology is unknown/shared).SPIN_BOUNDderived from measured job-completion latency on target hardware, not picked by feel.waitingis false and completers correctly skip notify; only on spin exhaustion does the waiter setwaiting+ recheck + park.Context
crates/taktora-executor/src/pool.rs(Tracker::complete/wait_for_quiescence).SeqCstflag-decision race + lock-protected park handshake) is documented in-code and enforced by the loom model — the spin must preserve both.