Skip to content

feat(e2e): typed cross-backend Rust harness (rise-e2e) + SA token-exchange scenario#390

Open
NiklasRosenstein wants to merge 26 commits into
developfrom
feat/rust-e2e-harness
Open

feat(e2e): typed cross-backend Rust harness (rise-e2e) + SA token-exchange scenario#390
NiklasRosenstein wants to merge 26 commits into
developfrom
feat/rust-e2e-harness

Conversation

@NiklasRosenstein

@NiklasRosenstein NiklasRosenstein commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Summary

Replaces the drift-prone bash E2E suites with a typed, cross-backend Rust harness (crate rise-e2e, a standalone workspace under tests/e2e/). Scenarios are written once against a Backend driver seam and run on both backends; an unsupported combo is a logged Skip(reason) — a declared parity gap, never silent drift. Both bash e2e suites (e2e-docker.sh, e2e-minikube.sh) are deleted — the harness is now the sole e2e mechanism.

Both backends self-provision their own stack

  • DockerBackenddocker compose standalone stack (CLI via docker cp, Traefik reach, container/label/env inspection, Traefik API, identity compose overlay).
  • MinikubeBackend — a full Rust port of the minikube bring-up: minikube start + helm upgrade --install + (jfrog-vault mode) JFrog/Vault via compose + Vault role wait + --insecure-registry + containerd certs; background kubectl port-forwards; minikube delete on teardown.

Scenario matrix

scenario docker minikube
public-deploy Run Run
sa-token-exchange Run Run
private/forwardAuth Run Skip (nginx auth path)
health-rolling cutover Run Skip (Traefik-specific)
loki log-retention Skip (no Loki) Run
helm idempotency Skip (no chart) Run
workload-identity Run Run (jfrog-vault)
  • sa-token-exchange closes the WS2 Phase 2 gap end-to-end on both backends (SA trusting Dex → Dex id_token via password grant → RISE_IDENTITYproject list returns the SA's project; + negative for the un-exchanged token).
  • health-rolling cutover ports the Traefik serverStatus no-5xx-gap regression probe; workload-identity builds the fixture from source and asserts the /identity JSON + in-place token re-mint.

CI

  • e2e-docker, e2e-minikube-harness, e2e-minikube-harness-jfrog-vault (all RISE_E2E_BACKEND-gated) + a fast rise-e2e harness quality lint/test job.
  • Removed: the bash e2e-minikube / e2e-minikube-jfrog-vault jobs and the e2e-docker bash smoke step.
  • dev/dex/config.yaml + helm/rise/values-ci.yaml: Dex password grant (CI-only); values-ci relaxes the SSRF guard for the in-cluster http issuer.

Status

All harness jobs pass in CI: public-deploy, sa-token-exchange (both backends), private/forwardAuth, health-rolling cutover, loki-log-retention, helm-idempotency, and workload-identity (docker + minikube jfrog-vault). tests/e2e-build/run.sh (CLI build backends) is a separate domain, intentionally out of scope.

🤖 Generated with Claude Code

…hange scenario

Replace drift-prone bash E2E scripts with a typed Rust harness. Scenarios are
written once against a `Backend` driver seam and run on either backend; an
unsupported combo is a logged `Skip(reason)` (a declared parity gap, not silent
drift). Gated on `RISE_E2E_BACKEND`, so `cargo test --workspace` skips it.

Includes the SA token-exchange scenario that closes the WS2 Phase 2 E2E gap:
creates an SA trusting the Docker stack's Dex, mints a Dex id_token via the
resource-owner password grant, sets `RISE_IDENTITY`, and asserts `project list`
returns the SA's bound project (plus a negative for the un-exchanged token).

- crate rise-e2e (tests/e2e): Backend trait + Docker/Minikube drivers, scenario
  matrix, cli/http/token/dex helpers (token mint reuses rise-backend-auth)
- scenarios: public-deploy (both backends), sa-token-exchange (Docker Run /
  minikube Skip)
- dev/dex: enable the `password` grant on the rise-backend client (Dex v2.45
  gates it per-client; dev/CI-only)
- ROADMAP: mark the SA-exchange E2E item done; record the harness as the chosen
  mechanism with the remaining ports as declared follow-ups

The e2e-docker CI job wiring (.github/workflows/ci.yml) is applied separately —
the push token lacks `workflow` scope.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 13, 2026

Copy link
Copy Markdown

Docs preview: /preview/pr-390/

Updated for commit 0f4d2ec.

Adds a Rust toolchain + cache and a harness step (RISE_E2E_BACKEND=docker) to the
existing e2e-docker job, after the bash smoke tests. The harness stands up its own
self-contained compose stack and runs serially (single shared stack).

Split from the crate commit because that push token lacked `workflow` scope.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- scenario.rs: sa-token-exchange now passes `--claim aud=rise-backend` plus the
  email claim — `service-account create` hard-requires an `aud` claim and >=2
  claims (exit 1 otherwise), so the scenario previously failed on every run;
  `aud=rise-backend` matches the Dex id_token audience (the client_id) so the
  exchange claim-matcher still passes
- dev/dex/config.yaml: make the rise-backend grantTypes a strict superset of
  Dex's default set (restore device_code + token-exchange) plus password, so
  `rise login --device` keeps working (setting grantTypes replaces the default)
- docker.rs reach_app: keep polling on 5xx (not just 404) so a transient Traefik
  response during route wiring doesn't fail the 200 assertion
- e2e.rs: run bring_up/scenarios under catch_unwind so tear_down always runs,
  preventing a leaked stack on a bring_up panic
- http.rs: reuse a single blocking client (OnceLock) instead of building one per
  request; add a 5s connect_timeout and tighten the request timeout to 10s
- remove dead API (Backend::public_url/ci_token accessors, token::decode_claims);
  token test now verifies the minted bearer via RiseTokenSigner::verify_user_jwt
- scenario.rs unique(): pid + atomic counter instead of a 6-digit time bucket
- Cargo.toml: correct the inaccurate reqwest "unifies TLS" comment

Deferred (design follow-ups, not blind-fixed): DockerBackend re-provisions its own
stack rather than attaching to the bash-provisioned one (doubles bring-up per CI
job) and per-scenario teardown/isolation — both noted for the ROADMAP migration.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Phase 1 of porting the minikube e2e into the rise-e2e harness: the harness needs
to mint a Dex OIDC id_token non-interactively to exercise sa-token-exchange on
Kubernetes. Override the chart's rise-backend Dex client in values-ci.yaml to add
`grantTypes` including `password` (the full default set + password, since setting
grantTypes replaces rather than extends Dex's default). CI-only — the chart
default is untouched, so operators using the bundled Dex are unaffected.

enablePasswordDB / staticPasswords (user@example.com) / oauth2.passwordConnector
come from the chart defaults and deep-merge in; verified via `helm template`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…etes

Phase 2: MinikubeBackend now stands up its own cluster — a Rust port of
scripts/ci/e2e-minikube.sh's default (oci-client-auth) bring-up — instead of
attaching to a bash-provisioned one, giving Docker/K8s symmetric self-provision.

- bring_up: minikube delete/start (--driver=docker, cpus/memory) + ingress addon
  + helm upgrade --install (values-ci.yaml + image/auth_backend_url flags) +
  kubectl wait Available/Ready (10m) + background kubectl port-forward of the
  server (3000) and Dex (5556); health/discovery polled with connection-error
  tolerance during forward warmup
- PortForward: kubectl port-forward child killed on Drop and in tear_down;
  tear_down also runs `minikube delete`
- reach_app: per-project `kubectl port-forward svc/<app>` (discovered by name +
  port in namespace rise-<project>), closing the prior Ok(None) gap so
  public-deploy asserts HTTP 200 on K8s too
- dex(): returns the in-cluster Dex (token_url 127.0.0.1:5556, iss = cluster DNS)
- RegistryMode enum; jfrog-vault bring_up bails until Phase 4
- scenario sa-token-exchange: applies_to now Run on both backends
- ci: new e2e-minikube-harness job (RISE_E2E_BACKEND=minikube), alongside the
  bash job until parity

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The image build (cargo-chef over the root workspace) failed deterministically on
every commit of this branch: adding `tests/e2e` to the root [workspace].members
made the Dockerfile planner/builder unable to resolve the workspace (they copy
each member manifest explicitly and never copy tests/), so `cargo chef prepare`
errored and — because every e2e job needs package-image — all e2e jobs were
skipped. The harness had therefore never actually run in CI.

Fix at the right altitude: the test-only harness shouldn't be in the production
image's build graph at all. Make tests/e2e its own standalone Cargo workspace
(own [workspace] + Cargo.lock, concrete dep versions), excluded from the root.
The root build/image are now untouched by it, and the root Cargo.lock drops the
second reqwest stack the harness pulled in.

- Cargo.toml: drop tests/e2e from members; add `exclude = ["tests/e2e"]`
- tests/e2e/Cargo.toml: `[workspace]` + pinned deps (no `workspace = true`)
- ci.yml: harness steps use `--manifest-path tests/e2e/Cargo.toml`; new
  rise-e2e-quality job (fmt/clippy/gated-test) since root quality jobs no longer
  cover the detached crate
- tests/e2e/.gitignore for its own target/; README updated for the new invocation
  and the now self-provisioning minikube backend

Verified locally: standalone build/clippy/fmt/gated-test green; root metadata no
longer lists rise-e2e.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The e2e-minikube-harness CI job failed with `helm: path "./helm/rise" not found`:
`cargo test` runs the test binary with CWD = the crate dir (tests/e2e), not the
repo root, so the relative chart/values paths didn't resolve. Resolve repo_root
from CARGO_MANIFEST_DIR (as the docker backend already does) and pass absolute
paths to `helm upgrade` (chart + --values); also mount the repo root as the CLI
container's workdir so deploy-from-source paths resolve later.

(The Docker harness job passed in CI — this only affected the minikube path.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two real cross-backend differences the minikube harness surfaced once it ran:

- public-deploy failed with CreateContainerConfigError (runAsNonRoot vs root):
  K8s app pods run non-root, but traefik/whoami runs as root. Add a per-backend
  `sample_app()` to the Backend trait — Docker keeps traefik/whoami:80 ("Hostname:"
  marker, unchanged/green), minikube uses nginxinc/nginx-unprivileged:alpine:8080
  ("nginx" marker), matching the bash minikube suite. PublicDeploy now asserts the
  body marker on whichever backend provides one (no more Docker-only special case).

- sa-token-exchange failed at `service-account create` with "Invalid OIDC issuer
  URL: URL must use HTTPS": the create handler runs the issuer through
  ssrf::validate_url, which requires HTTPS unless ssrf.allow_http /
  allow_private_networks. The in-cluster Dex issuer is plain-HTTP on a private IP.
  Set both in values-ci.yaml (CI-only, mirrors the docker standalone overlay).

The minikube bring-up itself (minikube start + helm + 10m wait + port-forwards)
worked — these were application-level config gaps, not harness plumbing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
New minikube-only scenario (Docker Skips — the compose stack has no Loki),
porting scripts/ci/e2e-minikube.sh's logging assertion: deploy the sample app,
generate a request, stop the deployment, wait for the workload to be gone, then
prove logs are still served by Loki (not live kubelet) via the log-volume API and
the SSE log stream.

- Backend: api_base()/ci_bearer() + default authenticated api_get(); default
  wait_workload_removed() (CLI-based), overridden on minikube with a stronger
  zero-pods kubectl check
- http: get_auth() (Bearer) for authenticated Rise API calls
- scenario asserts /logs/volume total>0 and >=1 level after pod removal, and that
  `rise deployment logs` returns backlog
- chrono dep for the RFC3339 query window

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
New minikube-only scenario (Docker Skips — no Helm release): re-run
`helm upgrade` with the same args and assert it applies cleanly a second time
(no immutable-field/diff errors), porting e2e-minikube.sh's idempotency check.

Adds Backend::reapply_chart() (default bails; minikube re-runs helm upgrade).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Ports the remaining minikube e2e capabilities into the harness:

- MinikubeBackend jfrog-vault mode: bring_up starts JFrog+Vault via docker
  compose, waits for the Vault Artifactory role, starts minikube with
  --insecure-registry, wires the node's containerd to the JFrog registry over the
  shared docker network, and adds the jfrog/vault helm overrides. public_url +
  CI-token iss/aud switch to the in-cluster FQDN so deployed pods can reach the
  issuer. tear_down compose-rm's the services.
- rise_cli_build (docker-socket mount) for `deploy --backend docker:build`.
- workload-identity scenario (minikube + jfrog-vault only; Skips otherwise):
  builds & deploys tests/e2e-identity-fixture from source, then asserts the
  fixture's /identity JSON (credential present; file + exchanged tokens
  signature-valid with the right aud; project-bound sub; matching iss) and that
  the controller re-mints the file token in place (new jti within the window).
- Scenario::applies_to now receives &dyn Backend so applicability can depend on
  backend capability (supports_source_build), not just kind.
- http::get_auth_header (Vault token); new e2e-minikube-harness-jfrog-vault CI job.

All CI-only-verifiable (no cluster/registry locally); the prior minikube phases
are green, so this builds on a validated foundation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The rise-e2e harness now covers everything the bash minikube suite did (public
deploy, Loki retention, helm idempotency, workload identity in jfrog-vault mode)
plus SA token exchange, and both harness minikube jobs are green. Remove the
superseded bash:

- delete scripts/ci/e2e-minikube.sh
- remove the e2e-minikube and e2e-minikube-jfrog-vault CI jobs (the
  e2e-minikube-harness* jobs replace them)
- drop now-dangling references to the deleted script in docs/comments

scripts/ci/lib/identity.sh is kept — still sourced by e2e-docker.sh until the
Docker scenarios are ported (Phase 7).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… (Phase 7a)

Ports two of the three remaining e2e-docker.sh scenarios into the harness:

- private-forward-auth (Docker only; minikube uses nginx annotations): deploy a
  private project, assert Traefik forwardAuth + router-middlewares labels on the
  app container, an unauthenticated same-host 302 to /.rise/auth/signin, that
  /.rise/* is backend-served (200, not the app), and that a rise_jwt cookie is
  allowed through.
- workload-identity now also runs on Docker: supports_source_build() is true (the
  host has docker), and a new prepare_workload_identity() hook recreates the
  backend with the identity compose overlay (no-op on minikube). The assertions
  already reach /identity via the backend-appropriate ingress, so the scenario is
  shared.

New Backend hooks (defaults bail/no-op; Docker implements): traefik_base,
app_container_labels, traefik_api, ingress_get_once, prepare_workload_identity.
http: get_no_redirect (status + Location) and get_with_cookie. WorkloadIdentity
runs last (it mutates the Docker backend via the overlay). e2e-docker.sh stays
until the cutover scenario lands (Phase 7b).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Ports the last e2e-docker.sh scenario — the durable serverStatus regression
defense. Docker only (Traefik-specific); minikube Skips.

Generates a health_check + 2-replica rise.toml, deploys it (REV=1), reads the
group-scoped Traefik service name off the app container label (with the
sanitized-base + 16-hex drift guard), and asserts the Traefik API exposes a
correctly-shaped non-empty top-level serverStatus (http://host:port → UP/DOWN,
>=1 UP). Then redeploys (REV=2) and asserts NO 5xx gap across the rollout
(single un-retried ingress GETs), and that REV=2 became the sole live revision
(every running app container carries REV=2, none REV=1).

Adds Backend::app_container_envs (docker inspect Config.Env per container).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ite (Phase 7c/8)

All Docker scenarios (public deploy, SA token exchange, private/forwardAuth,
health-rolling cutover, workload identity) now run green in the rise-e2e harness,
so the bash docker suite is fully superseded:

- delete scripts/ci/e2e-docker.sh and scripts/ci/lib/identity.sh (no consumers
  left — both bash suites are gone; the harness has its own assertions)
- e2e-docker CI job: drop the bash smoke step, keep the harness step; rename to
  "Docker backend end-to-end (rise-e2e harness)"
- ROADMAP: mark the cross-backend E2E migration done with the final scenario
  matrix; docs: fix the docker quick-start reference
- drop now-dangling references to the deleted script in code comments

The drift-prone bash e2e suites are fully replaced by the typed cross-backend
rise-e2e harness.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Remove references to the prior bash e2e suites from comments/docs — the rise-e2e
harness is the only way e2e tests are run, so docs describe the current design
rather than contrasting with a removed version (per the repo comment guideline).

Touches doc comments in lib.rs/token.rs/docker.rs/minikube.rs, the e2e README,
the CI job comments, the values-ci.yaml header, and the ROADMAP section.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…stness)

- health-rolling-cutover (#1, the important one): `rise deploy` follows the new
  revision to a terminal state, so the old after-the-fact 5xx probe sampled an
  already-converged stack and never witnessed the cutover — a vacuous assertion.
  Run a background probe thread that hammers the ingress (Host-routed GETs,
  100ms) concurrently with the blocking deploy and assert the worst status seen
  was < 500, so the scenario actually spans the rollout window it defends.
- minikube wait_workload_removed (#2): only treat empty kubectl stdout as
  "pods gone" when kubectl exited 0 — a transient error (empty stdout, non-zero
  exit, message on stderr) must not read as removed and let a Loki query hit a
  still-live pod.
- minikube jfrog-vault (#3): tear down with `docker compose down -v` (removes the
  named volumes + network) instead of `rm -fsv`, and `down -v` before bring-up, so
  no Vault/Artifactory state leaks across runs.
- CI bearer TTL (#4): mint with a 6h TTL (was 1h) so a long minikube + jfrog-vault
  run can't expire the token mid-run.

The "wait_healthy matches Unhealthy" claim was refuted (Display renders
"Unhealthy" with a lowercase h; the capital-H "Healthy" substring can't match).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The harness should drive actions via the CLI (it's a product surface under test)
but assert results via the machine-readable API. Convert the two oracles that
parsed human CLI output:

- wait_healthy / wait_workload_removed: poll GET /api/v1/projects/{p}/deployments
  and check the latest deployment's `status` field (== / != "Healthy") via a
  shared `latest_deployment_status` helper, instead of substring-matching the
  `deployment list` prettytable. (The `rise` CLI has no --output json.)
- sa-token-exchange: read the SA's synthetic email from
  GET /api/v1/projects/{p}/service-accounts instead of scraping
  `service-account create` stdout.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the docker-only private-forward-auth scenario with a shared
private-ingress-auth that verifies the real auth contract (block unauthenticated,
allow authenticated) on Docker AND Kubernetes — closing the parity gap where the
K8s nginx-auth ingress path had no coverage.

The behavioral assertions are written once; only the wiring + reach differ behind
the Backend trait:
- ingress_get(project, path, follow, cookie): one GET through the REAL ingress
  (Traefik on docker; the nginx ingress controller on minikube via `minikube ip`
  + Host header) — unlike reach_app, which port-forwards straight to the Service
  and would bypass ingress auth.
- assert_ingress_auth_configured: Traefik forwardAuth labels on docker; nginx auth
  annotations (kubectl get ingress -o json) on minikube.
- app_host(): {project}.rise.localhost (docker) / {project}.apps.rise.local (k8s).

Scenario: deploy a private (Member) project, assert auth is wired, assert
unauthenticated → 302 to the project's /.rise/auth/signin, and authenticated
(rise_jwt cookie) → 200 reaching the app. Adds a `private` (Member) access class
to values-ci.yaml and fetches the minikube node IP in bring_up.

http: consolidate get_no_redirect + get_with_cookie into one `request` (status +
Location + body, with Host/cookie/redirect control); drop ingress_get_once.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make a harness run self-explanatory and trustworthy:

- new `report` module: titled sections, timed `step`/`step_value` (label … ok/200
  (Xs)), and a `human` duration formatter. Plain text (no ANSI) so CI logs and
  terminals stay clean.
- bring_up steps are now reported with timing + outcome on both backends (docker
  compose down/up, /health → 200, CLI extract; minikube delete/start, ingress
  addon, node ip, helm, kubectl waits, port-forwards, /health → 200, dex
  discovery) — you can see the stack actually came up and is online.
- run_all reports each scenario with its duration and prints a summary
  (N passed / M failed / K skipped in T), plus bring-up / teardown / total timings.
- e2e.rs documents why this is a single `cargo test` (every scenario shares one
  expensive backend bring_up, so they run as one in-order suite) and prints a
  run header (backend + registry mode).

Output-only; no scenario/behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…me window)

The jfrog-vault workload-identity refresh assertion flaked ("file token did not
refresh") because the per-iteration reach_app churned a fresh kubectl
port-forward each sample (erratic, sparse sampling) and the fixed 36-iteration
window was borderline for the kubelet projected-volume refresh lag.

Add Backend::poll_app: hold ONE ingress route open for the whole window and
sample densely (every 5s, up to 240s), exiting as soon as the new still-valid
jti appears. Docker reaches via Traefik (no forward); minikube holds a single
port-forward. The refresh poll uses it instead of looping reach_app — fixing both
the flake and the port-forward churn the review flagged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…test test

A single suite-style #[test] gained nothing from libtest (the "running 1 test /
test result: ok. 1 passed" just wrapped our own report). Make the e2e suite a
plain binary: `harness = false` + a `main() -> ExitCode` that runs the suite and
exits non-zero on failure. `cargo test` still builds & runs it (CI unchanged
minus the now-irrelevant libtest args); the real `token` unit test stays under
libtest. Output is now purely our sectioned report.

Also harden the workload-identity jti-refresh further: bump the observation window
to 360s and, on failure, emit a decisive diagnostic (first vs last jti + exp) so a
future failure shows whether the token is rotating-but-slow-to-project or not
re-minting at all.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Review follow-ups closing false-green gaps in the new oracles:

- private-ingress-auth: the authed check followed redirects, so a
  rejected cookie (302 -> signin, which serves 200) could satisfy the
  status==200 assertion. Stop following redirects: a 200 now means the
  ingress genuinely let the cookie through.
- workload-identity: gate the file-token "refreshed" predicate on the
  expiry advancing, not just the jti changing — a re-mint that reused
  the old exp no longer counts as a refresh.
- workload-identity: make the failure-path diagnostic reach best-effort
  so it can't mask the jti/exp bail! it exists to print.
- reach_app: use app_host(project) instead of re-inlining the host
  literal (single source for the Traefik host).
- docker-e2e.local.yaml: repoint the stale scripts/ci/lib/identity.sh
  comment at the rise-e2e workload-identity scenario.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…gone check

Follow-ups agreed in review:

- wait_workload_removed: drop the status-string default (a not-Healthy
  deployment doesn't prove the workload is gone — it may be terminating)
  and make it a required trait method. minikube keeps its zero-pods
  check; docker gains a real "app container gone" check.
- PortForward::spawn (minikube): block until the local port actually
  accepts a connection (or kubectl exits), so a bind conflict — e.g. a
  port just released by a dropped forward — surfaces as a clear infra
  error instead of a window of connection-refused misread as the
  workload failing.
- Dedup app reach: minikube reach_app/poll_app now share one
  with_app_forward helper (single forward-setup + liveness point);
  docker poll_app delegates to ingress_get so there's one definition of
  "reach this app through the ingress".

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When a run fails, the failing command's own stdout/stderr rarely
explains the root cause (e.g. why a pod won't start), and teardown then
deletes the cluster / compose stack — so there's no post-mortem in CI or
locally.

Add Backend::dump_diagnostics (no-op default), called from the harness
on any failure (scenario error or panic) *before* tear_down, while the
stack is still up:

- minikube: minikube status, pods -A, recent events, describe of the
  rise pods, rise server logs, minikube logs, and (jfrog-vault) the
  registry compose ps + logs.
- docker: compose ps + tailed compose logs.

cli::dump runs each command best-effort and prints its output to stderr
under a header, so the diagnostics land inline in the run log.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address the brutal-review findings on the diagnostics/reach changes:

- cli::dump now enforces a 30s wall-clock cap per command (drains both
  pipes on threads, kills on timeout). The diagnostics path runs when the
  cluster is most likely unhealthy, so an unbounded kubectl/minikube call
  could otherwise hang the whole job until the CI timeout. kubectl calls
  also get --request-timeout=20s for a graceful client-side bound.
- PortForward::spawn captures kubectl's stderr (instead of nulling it) and
  includes it in the failure message, and no longer hard-codes "port in
  use" as the cause. It now waits for the local port to be free *before*
  binding, which both gives an accurate distinct error and closes the
  race where a stale listener could mask a failed bind. stderr is drained
  on a thread in the success path so a long-lived forward can't block.
- minikube dump_diagnostics now also describes + logs the app pods in the
  rise-<project> namespaces (the control-plane dump alone won't show why a
  deployed app failed).
- Note docker wait_workload_removed is parity-provided (no docker scenario
  exercises it yet).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread tests/e2e/README.md
RISE_E2E_BACKEND=docker \
RISE_IMAGE_TAG=<tag> \
RISE_IMAGE_REPOSITORY=ghcr.io/rise-deploy/rise \
cargo test --manifest-path tests/e2e/Cargo.toml

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use libtest anymore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant