Dispatcher + claims, Phases A and B#99
Merged
Merged
Conversation
Lay the data foundation for the dispatcher + claims feature (see
docs/dispatcher-and-claims.md). No new API surface yet — Phase B
adds the claim endpoint, Phase C the dispatcher.
* `claims` table with FKs, composite-status indexes, a partial
index on (expires_at) restricted to pending rows, and a
`claims_scope_fk_coherence` CHECK constraint that enforces
"build-scope rows carry no TCC; test-scope rows must."
* Boolean columns on `commits` (ci_skip, wants_full_inlists,
wants_fpe, wants_converge) plus their `*_satisfied_at`
datetime companions. Populated at ingest via a new
`CommitMessageFlags.parse` module wired into
`Commit.hash_from_github`, the single chokepoint both
`insert_all` and `create_or_update_from_github_hash` flow
through.
* Parser scans ONLY the first line of the commit message.
Squash/merge commits routinely list every constituent
commit's subject in their body; a whole-message scan would
pull every `[ci ...]` directive from every squashed commit
into the merge. MESA convention places directives in the
subject line of the actual commit they apply to.
* Backfill of the four flag columns on every existing commit
via Postgres regex against `split_part(message, E'\n', 1)`
inside the migration's `up`. Verified row-by-row against the
parser on the 10,087-commit prod snapshot in local dev —
zero mismatches. The first-line restriction filters ~140
false-positive triggers vs. the whole-message scan.
* `Commit#ci_*?` predicates rewritten as thin pass-throughs to
the new columns so `test_candidate` and the submissions API
keep working without any change to their contract.
`ci_optional_n` still parses the message at read time — the
integer in `[ci optional 1234]` isn't stored as a scalar.
* Three nullable columns on `submissions` (use_fpe,
use_full_inlists, use_converge) and a nullable `claim_id`
foreign key — Phase B fills these in, but landing the
schema now keeps the migration list ordered cleanly.
* `Claim` model with the basic associations, scope/status
inclusion validations, and a model-level validation that
test-scope claims' TCC belongs to the claim's commit (the
friendly version of the CHECK constraint). 55 new specs
cover the parser (including first-line restriction +
squash/merge body cases), the model validation matrix, the
DB-level coherence constraint, and the ingest column
population path.
Phase A landed the schema; this lights it up. Claims can now be
created via API, expire on schedule, and be fulfilled by an
incoming submission.
* POST /api/v1/claims — new endpoint at the start of a v1 API
namespace (existing flat routes stay where they are). Auth
mirrors SubmissionsController exactly: `submitter` block with
email/password/computer, password verified by bcrypt against
the User, computer scoped to the authenticated user's
computers. Build-scope claims set expires_at 15 min out; test-
scope claims look up the matching TCC by (module, name) on the
claimed commit and get 12 h. Distinct error paths for unknown
SHA (404), missing test case on the commit (404), unknown
scope (422), and missing test-case identifier on test scope
(422), so a misbehaving client gets a precise signal rather
than a generic validation dump.
* Claim.default_expires_at + TTL_FOR_SCOPE constants — TTL
values own the model side, controller stays at the HTTP
boundary. 15 min / 12 h matches the V1 design in
docs/dispatcher-and-claims.md.
* Claim#fulfill! — flips a claim to `fulfilled` and stamps
fulfilled_at. Idempotent across both legal starting states
(pending OR expired) — a late submission that arrives after
the sweeper has expired its claim still legitimately credits
it. Uses update_columns so a stale `updated_at` race against
the sweeper can't reject the write.
* Claim.sweep_expired! + `rake claims:sweep` — bulk UPDATE
backed by the partial index `index_claims_on_expires_at_pending`
on (expires_at) scoped to pending rows. Cheap; Railway cron
fires it every ~5 min. Logic lives on the model so it's
unit-testable without rake plumbing.
* Submission `after_create_commit :fulfill_claim` — when a
submission POST carries a `claim:` block in the payload
(claim_id + started_at + use_* flags), the matching claim is
fulfilled as the create commit settles. Backwards-compatible:
legacy mesa_test versions that don't send the block continue
to work unchanged; the resulting submission has claim_id NULL
and the callback short-circuits via its `if: claim_id.present?`
guard.
* 23 new specs covering: claim endpoint happy + sad paths (build
+ test, unauthenticated, wrong password, foreign computer,
unknown commit, bad scope, missing TCC, dispatched_at echo);
TTL helper output; fulfill! across pending + expired starting
states + the stale-row race; sweep_expired! transitions +
idempotency + leaves-fulfilled-alone + updated_at refresh;
submissions integration (claim_id fulfills pending claim,
claim_id fulfills expired claim for late-submission case, no
claim block is a no-op).
403 specs total, all green.
Pre-claims, the only signal we had for "work is in flight on this
commit" was "the TCC has no submissions yet" — which fires the
instant a commit is ingested, well before any human has touched
the SHA. That made every freshly-ingested commit look like
something was actively running on it, and made the all-builds-
failed case lump unretried tests into pending. Both were wrong;
nobody was working on those commits.
Now that claims exist, this turns out to be a one-line gate at
each aggregation site:
* `Commit#has_pending_claims?` and the matching pre-filtered
`has_many :pending_claims` association — cheap presence check
that plays nicely with `.includes(:pending_claims)` on the
commits-index render path (no N+1 across 25 rows). Same shape
on TestCaseCommit, plus `TCC#pending?` for the
"untested-but-claimed" view-level question.
* `CommitState#tests_status` — the `:pending` / `:not_run`
branch now keys on `has_pending_claims?`, not on
"`counts[:untested] > 0`." A commit with TCCs sitting at
status=-1 and no live claim reads as `:not_run` (truly
untouched). The instant a build or test claim lands, it
flips to `:pending` (or `:pending_partial` if some tests
already passed).
* `CommitState#commit_state` row-classification loop — the
`no_data ⇒ pending_tests += 1` fallback now requires the
TCC to actually have a pending test-scope claim. Cell-level
pending (computer built but no test result yet) still counts
regardless of claims; that's a different signal.
* `CommitState#_tccs_for_matrix` now eager-loads
`pending_claims` so per-TCC `has_pending_claims?` reads from
the loaded association.
* `commits#index` extends the include set to cover both
`commit.pending_claims` and `tcc.pending_claims` so a single
page render stays at the same query count it had before.
19 new specs cover: tests_status :pending vs :not_run across
claim states; the `pending_tests` hero count over the same
states; Commit and TCC `has_pending_claims?` and `pending?`
predicates including the load-vs-query-shape check that protects
the index from N+1; revert-on-fulfill / revert-on-expire
transitions. One pre-existing spec that asserted the leaky
behavior ("counts tests with no built-computer results as
pending") was updated in place to test the new semantics —
`pending_tests = 0` without claims, `= 2` with two test-scope
claims out.
421 specs total, all green.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Lands the data foundation and the claim creation/fulfillment lifecycle for the dispatcher + claims feature described in
docs/dispatcher-and-claims.md. Phase C (dispatcher endpoint) and Phase D (mesa_test client work) are still upcoming; this PR is shippable on its own and dormant until those land.What's in
Phase A — schema and CI flag parsing (commit `ebf04bc`)
Phase B — endpoint, sweeper, fulfillment (commit `2ae1604`)
Phase B (extended) — "pending" wired into aggregations (commit `5c7f4a9`)
Risk and rollout
The schema migration is metadata-only on Postgres 18 (Railway's version) for all column adds, plus a single backfill UPDATE on ~10k commits (~0.3s locally, well inside Railway's deploy-restart envelope).
One user-visible behavior change worth noting: `tests_status`'s `:pending` / `:not_run` distinction now requires real claim presence. Until Phase D ships mesa_test client work, no commits will ever read as `:pending`. That means:
This corrects the existing leaky proxy — "fresh commit looks pending" was never accurate. The tiles light up correctly once Phase D ships real claim data.
Everything else is dormant:
Test plan