Skip to content

perf(executor): spin before parking in Tracker::wait_for_quiescence (O(1)→O(0) futex syscalls/cycle) #108

@patdhlk

Description

@patdhlk

Phase 2 of the Tracker quiescence-handshake optimization. Phase 1 (the waiting-flag gate) takes per-cycle futex_wake from O(N) to O(1) by only notifying when the WaitSet thread is actually parked. This issue tracks the deferred phase 2: a bounded spin in wait_for_quiescence before parking, so that in the common case (jobs finish in microseconds) the dispatch thread never parks at all — waiting stays false, no futex_wait, no futex_wake, zero syscalls per cycle.

This was deliberately split out of phase 1 because it is the only piece carrying real-time scheduling risk and its entire justification is empirical. It must clear the acceptance criteria below before merging.

Goal

Eliminate the residual O(1) futex_wait (waiter parks) + futex_wake (last completer notifies) per cycle by spinning briefly on the submitted/completed counters before taking the lock and parking.

Acceptance criteria

  • Hard gate — measured jitter win on hardware. Must demonstrate a dispatch-jitter improvement on the Pi5 via the existing CycleObservation histogram, A/B against phase-1 main. No hardware proof → no merge. (This is the Pi A/B that was declined for phase 1; for the spin it is mandatory, because the spin trades CPU for syscalls and its only justification is the measured result.)
  • Safety — no priority inversion. Bounded spin only. Do not spin when the dispatch thread shares a core with a SCHED_FIFO worker: affinity and FIFO priority are caller-configurable per crates/taktora-executor/src/thread_attrs.rs, and a spinning higher-or-equal-priority thread on a shared core can prevent the very worker it is waiting on from being scheduled. Gate the spin on core topology (or skip it entirely when topology is unknown/shared).
  • Tuning — no magic constant. SPIN_BOUND derived from measured job-completion latency on target hardware, not picked by feel.
  • Loom model still passes (the spin path must not introduce a lost wakeup): during the spin waiting is false and completers correctly skip notify; only on spin exhaustion does the waiter set waiting + recheck + park.

Context

  • Phase 1 handshake: crates/taktora-executor/src/pool.rs (Tracker::complete / wait_for_quiescence).
  • The two-hazard ordering rationale (Dekker SeqCst flag-decision race + lock-protected park handshake) is documented in-code and enforced by the loom model — the spin must preserve both.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions