Skip to content

Dispatcher + claims, Phases A and B#99

Merged
wmwolf merged 3 commits into
masterfrom
feature-claims-schema
May 28, 2026
Merged

Dispatcher + claims, Phases A and B#99
wmwolf merged 3 commits into
masterfrom
feature-claims-schema

Conversation

@wmwolf

@wmwolf wmwolf commented May 28, 2026

Copy link
Copy Markdown
Member

Summary

Lands the data foundation and the claim creation/fulfillment lifecycle for the dispatcher + claims feature described in docs/dispatcher-and-claims.md. Phase C (dispatcher endpoint) and Phase D (mesa_test client work) are still upcoming; this PR is shippable on its own and dormant until those land.

What's in

Phase A — schema and CI flag parsing (commit `ebf04bc`)

  • `claims` table with FKs, composite-status indexes, partial pending-`expires_at` index, and a `claims_scope_fk_coherence` CHECK constraint.
  • Boolean + datetime flag columns on `commits` (`ci_skip`, `wants_full_inlists`, `wants_fpe`, `wants_converge`, plus `*_satisfied_at` companions), populated at ingest by a new `CommitMessageFlags.parse` module wired into `Commit.hash_from_github`. Scans only the first line of the commit message so squash/merge bodies don't inherit every constituent commit's directive.
  • Three `use_*` columns and a nullable `claim_id` FK on `submissions`.
  • Migration backfills the four flag columns on all existing commits via Postgres regex. Verified row-for-row against the parser on the 10,087-commit prod snapshot: zero mismatches.
  • `Commit#ci_*?` predicates rewritten as thin pass-throughs to the new columns; identical answers via the backfill.

Phase B — endpoint, sweeper, fulfillment (commit `2ae1604`)

  • `POST /api/v1/claims` — new endpoint at the start of a v1 API namespace. Auth mirrors `SubmissionsController` (submitter block, bcrypt-verified, computer scoped to user). Build claims expire in 15 min; test claims in 12 h. Test-scope claims look up the matching TCC by (module, name).
  • `Claim#fulfill!` + `Claim.sweep_expired!` + `rake claims:sweep` — bulk UPDATE backed by the partial index, exposed for the eventual Solid Queue recurring job (see #98).
  • Submissions endpoint accepts an optional `claim:` block (`id`, `started_at`, `use_*` flags) and an `after_create_commit :fulfill_claim` on Submission flips the matching claim to fulfilled. Backwards-compatible — legacy mesa_test versions that don't send the block work unchanged.

Phase B (extended) — "pending" wired into aggregations (commit `5c7f4a9`)

  • `Commit#has_pending_claims?` + matching `TestCaseCommit#has_pending_claims?` / `#pending?` predicates with pre-filtered `pending_claims` associations.
  • `CommitState#tests_status` keys `:pending` / `:not_run` off `has_pending_claims?`, not on "TCC has no submissions."
  • `CommitState#commit_state` row-classification loop gates the `no_data → pending_tests` fallback on real claim presence.
  • `commits#index` extends its eager-load to keep query count flat across 25 rows.

Risk and rollout

The schema migration is metadata-only on Postgres 18 (Railway's version) for all column adds, plus a single backfill UPDATE on ~10k commits (~0.3s locally, well inside Railway's deploy-restart envelope).

One user-visible behavior change worth noting: `tests_status`'s `:pending` / `:not_run` distinction now requires real claim presence. Until Phase D ships mesa_test client work, no commits will ever read as `:pending`. That means:

  • The hero "Pending" tile on every commit page reads `0` instead of "number of untested TCCs"
  • The pending banner stops firing on fresh commits
  • Subway map colors are unchanged (both `:pending` and `:not_run` render gray; only `:pending_partial` is blue)

This corrects the existing leaky proxy — "fresh commit looks pending" was never accurate. The tiles light up correctly once Phase D ships real claim data.

Everything else is dormant:

  • `POST /api/v1/claims` is a new route; no client knows it exists yet
  • `claims:sweep` rake task isn't wired to cron (the SQ migration in #98 covers that)
  • The submissions endpoint extension is opt-in via the new `claim:` payload key

Test plan

  • Full RSpec suite green (421 examples, 0 failures; +85 new specs across the three commits)
  • Migration rollback + re-migrate tested locally
  • Backfill verified row-for-row against the parser on the 10,087-commit prod snapshot
  • `rake claims:sweep` smoke-tested against the dev DB
  • After merge: monitor production deploy logs for the migration timing

wmwolf added 3 commits May 28, 2026 10:22
Lay the data foundation for the dispatcher + claims feature (see
docs/dispatcher-and-claims.md). No new API surface yet — Phase B
adds the claim endpoint, Phase C the dispatcher.

  * `claims` table with FKs, composite-status indexes, a partial
    index on (expires_at) restricted to pending rows, and a
    `claims_scope_fk_coherence` CHECK constraint that enforces
    "build-scope rows carry no TCC; test-scope rows must."
  * Boolean columns on `commits` (ci_skip, wants_full_inlists,
    wants_fpe, wants_converge) plus their `*_satisfied_at`
    datetime companions. Populated at ingest via a new
    `CommitMessageFlags.parse` module wired into
    `Commit.hash_from_github`, the single chokepoint both
    `insert_all` and `create_or_update_from_github_hash` flow
    through.
  * Parser scans ONLY the first line of the commit message.
    Squash/merge commits routinely list every constituent
    commit's subject in their body; a whole-message scan would
    pull every `[ci ...]` directive from every squashed commit
    into the merge. MESA convention places directives in the
    subject line of the actual commit they apply to.
  * Backfill of the four flag columns on every existing commit
    via Postgres regex against `split_part(message, E'\n', 1)`
    inside the migration's `up`. Verified row-by-row against the
    parser on the 10,087-commit prod snapshot in local dev —
    zero mismatches. The first-line restriction filters ~140
    false-positive triggers vs. the whole-message scan.
  * `Commit#ci_*?` predicates rewritten as thin pass-throughs to
    the new columns so `test_candidate` and the submissions API
    keep working without any change to their contract.
    `ci_optional_n` still parses the message at read time — the
    integer in `[ci optional 1234]` isn't stored as a scalar.
  * Three nullable columns on `submissions` (use_fpe,
    use_full_inlists, use_converge) and a nullable `claim_id`
    foreign key — Phase B fills these in, but landing the
    schema now keeps the migration list ordered cleanly.
  * `Claim` model with the basic associations, scope/status
    inclusion validations, and a model-level validation that
    test-scope claims' TCC belongs to the claim's commit (the
    friendly version of the CHECK constraint). 55 new specs
    cover the parser (including first-line restriction +
    squash/merge body cases), the model validation matrix, the
    DB-level coherence constraint, and the ingest column
    population path.
Phase A landed the schema; this lights it up. Claims can now be
created via API, expire on schedule, and be fulfilled by an
incoming submission.

  * POST /api/v1/claims — new endpoint at the start of a v1 API
    namespace (existing flat routes stay where they are). Auth
    mirrors SubmissionsController exactly: `submitter` block with
    email/password/computer, password verified by bcrypt against
    the User, computer scoped to the authenticated user's
    computers. Build-scope claims set expires_at 15 min out; test-
    scope claims look up the matching TCC by (module, name) on the
    claimed commit and get 12 h. Distinct error paths for unknown
    SHA (404), missing test case on the commit (404), unknown
    scope (422), and missing test-case identifier on test scope
    (422), so a misbehaving client gets a precise signal rather
    than a generic validation dump.
  * Claim.default_expires_at + TTL_FOR_SCOPE constants — TTL
    values own the model side, controller stays at the HTTP
    boundary. 15 min / 12 h matches the V1 design in
    docs/dispatcher-and-claims.md.
  * Claim#fulfill! — flips a claim to `fulfilled` and stamps
    fulfilled_at. Idempotent across both legal starting states
    (pending OR expired) — a late submission that arrives after
    the sweeper has expired its claim still legitimately credits
    it. Uses update_columns so a stale `updated_at` race against
    the sweeper can't reject the write.
  * Claim.sweep_expired! + `rake claims:sweep` — bulk UPDATE
    backed by the partial index `index_claims_on_expires_at_pending`
    on (expires_at) scoped to pending rows. Cheap; Railway cron
    fires it every ~5 min. Logic lives on the model so it's
    unit-testable without rake plumbing.
  * Submission `after_create_commit :fulfill_claim` — when a
    submission POST carries a `claim:` block in the payload
    (claim_id + started_at + use_* flags), the matching claim is
    fulfilled as the create commit settles. Backwards-compatible:
    legacy mesa_test versions that don't send the block continue
    to work unchanged; the resulting submission has claim_id NULL
    and the callback short-circuits via its `if: claim_id.present?`
    guard.
  * 23 new specs covering: claim endpoint happy + sad paths (build
    + test, unauthenticated, wrong password, foreign computer,
    unknown commit, bad scope, missing TCC, dispatched_at echo);
    TTL helper output; fulfill! across pending + expired starting
    states + the stale-row race; sweep_expired! transitions +
    idempotency + leaves-fulfilled-alone + updated_at refresh;
    submissions integration (claim_id fulfills pending claim,
    claim_id fulfills expired claim for late-submission case, no
    claim block is a no-op).

403 specs total, all green.
Pre-claims, the only signal we had for "work is in flight on this
commit" was "the TCC has no submissions yet" — which fires the
instant a commit is ingested, well before any human has touched
the SHA. That made every freshly-ingested commit look like
something was actively running on it, and made the all-builds-
failed case lump unretried tests into pending. Both were wrong;
nobody was working on those commits.

Now that claims exist, this turns out to be a one-line gate at
each aggregation site:

  * `Commit#has_pending_claims?` and the matching pre-filtered
    `has_many :pending_claims` association — cheap presence check
    that plays nicely with `.includes(:pending_claims)` on the
    commits-index render path (no N+1 across 25 rows). Same shape
    on TestCaseCommit, plus `TCC#pending?` for the
    "untested-but-claimed" view-level question.
  * `CommitState#tests_status` — the `:pending` / `:not_run`
    branch now keys on `has_pending_claims?`, not on
    "`counts[:untested] > 0`." A commit with TCCs sitting at
    status=-1 and no live claim reads as `:not_run` (truly
    untouched). The instant a build or test claim lands, it
    flips to `:pending` (or `:pending_partial` if some tests
    already passed).
  * `CommitState#commit_state` row-classification loop — the
    `no_data ⇒ pending_tests += 1` fallback now requires the
    TCC to actually have a pending test-scope claim. Cell-level
    pending (computer built but no test result yet) still counts
    regardless of claims; that's a different signal.
  * `CommitState#_tccs_for_matrix` now eager-loads
    `pending_claims` so per-TCC `has_pending_claims?` reads from
    the loaded association.
  * `commits#index` extends the include set to cover both
    `commit.pending_claims` and `tcc.pending_claims` so a single
    page render stays at the same query count it had before.

19 new specs cover: tests_status :pending vs :not_run across
claim states; the `pending_tests` hero count over the same
states; Commit and TCC `has_pending_claims?` and `pending?`
predicates including the load-vs-query-shape check that protects
the index from N+1; revert-on-fulfill / revert-on-expire
transitions. One pre-existing spec that asserted the leaky
behavior ("counts tests with no built-computer results as
pending") was updated in place to test the new semantics —
`pending_tests = 0` without claims, `= 2` with two test-scope
claims out.

421 specs total, all green.
@wmwolf wmwolf merged commit a839322 into master May 28, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant