ci: shard tests and reduce redundant work#37618
Conversation
- e2e and checks-backend: enable build-cache restore so they hit the seeded gobuild cache. - pgsql/sqlite/mysql/mssql: replace `make backend` with `make generate-go`. Integration tests build their own test binary and never invoke the gitea executable; only bindata generation is needed. - unit-tests-gogit: narrow to packages with gogit/nogogit-tagged files via a new `test-backend-gogit` Makefile target. Other packages produce identical compiled output regardless of the gogit tag, so retesting them was busywork. - cache-seeder: stop the lint job from competing with the gobuild job on the shared non-rotated gobuild key. Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
The previous commit replaced `make backend` with `make generate-go` on the assumption that the gitea executable was unused by the integration tests. It is used: integration tests install git pre-receive hooks that shell out to the binary, so `git push` operations during tests fail with "No such file or directory". Reverts that part of the previous change; the other cache-reuse tweaks remain. Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR optimizes PR CI runtime by improving Go build-cache reuse across jobs and reducing redundant -tags gogit unit test work.
Changes:
- Enable Go build-cache restore in
test-e2eandchecks-backendby removingbuild-cache: "false". - Add
make test-backend-gogitto rungo testover a smaller set of packages for thegogitpass. - Prevent the cache-seeder’s lint job from contending on the shared non-rotated
gobuildcache key by disabling build-cache in that job.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| Makefile | Adds GO_GOGIT_TEST_PACKAGES and test-backend-gogit target for a reduced gogit unit-test pass. |
| .github/workflows/pull-e2e-tests.yml | Enables restoring the Go build cache for the e2e workflow job. |
| .github/workflows/pull-db-tests.yml | Switches unit-tests-gogit to the reduced test-backend-gogit make target. |
| .github/workflows/pull-compliance.yml | Enables restoring the Go build cache for the backend checks job. |
| .github/workflows/cache-seeder.yml | Disables build-cache in seeder lint job to avoid cache-key contention with seeder gobuild job. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Splits each database integration job (pgsql/sqlite/mysql/mssql) into two parallel matrix shards. Test names are enumerated from the integration source and partitioned round-robin (~301/302 of 603 tests per shard); names that don't match the shard are filtered out via -test.run. Migration tests (~50-90 s, fast, sequential) only run on shard 1. The original job names (test-pgsql, test-sqlite, test-mysql, test-mssql) are kept as one-step aggregator jobs that depend on the shards job and report success only when all shards passed. This keeps any branch-protection rule referencing those names valid. Source-based enumeration is used because the test binary's -test.list calls TestMain, which boots the full Gitea environment and panics without a configured database. Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
Replaces the hardcoded list of three top-level paths (modules/git, modules/gitrepo, modules/lfs) with an at-build-time derivation. tools/find-gogit-test-pkgs.sh queries `go list` for every package whose own code or test code imports any of the gogit-affected modules — currently 72 packages — and feeds that list into `make test-backend-gogit`. This restores coverage that the static list dropped: callers like models/git, services/repository, modules/repository, etc., whose unit tests transitively exercise gogit/nogogit code paths via the modules/git API. Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
- pgsql/sqlite/mysql/mssql: increase shards from 2 to 3, dropping per-job wall time from ~13 min to ~9-10 min on the slowest (pgsql). - test-unit: split into test-unit-bindata and test-unit-gogit running in parallel, with a test-unit aggregator preserving the existing check name. Each gets its own rotated cache (cache-name unit and unit-gogit) so testcache accumulates per variant. Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
…t pass Two CI failures from sharding/splitting: 1. TestAPILFSNotStarted and TestAPILFSLocksNotStarted set setting.LFS.StartServer = false but never restore it. In the pre-sharding sequential run a later test happened to set it back to true, but with sharding TestLFSRender and TestChangeRepoFilesForUpdateWithFileRename can land in a different shard and run with LFS disabled. Use test.MockVariableValue so the change is scoped to the test that makes it. 2. find-gogit-test-pkgs.sh enumerated every package whose tests import the gogit modules, including tests/integration, tests/integration/migration-test, and models/migrations/... These need a real DB / dedicated harness and have to be tested separately; mirror GO_TEST_PACKAGES' filter to exclude them. Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
Rename: - test-unit-bindata -> test-unit-nogogit. Both jobs run with the bindata tag; the meaningful difference is the gogit codepath, and the codebase already uses the nogogit/gogit suffix on its build-tagged source files. test-unit-gogit cleanup: - The gogit-affected package set (modules/git/gitrepo/lfs and direct importers) doesn't touch elasticsearch/meilisearch/redis/minio/ azurite, so drop those services and the matching /etc/hosts step. Shard runner robustness (Copilot review feedback): - Validate TEST_SHARD/TEST_TOTAL_SHARDS are positive ints with shard in [1, total]; exit 2 on bad input. - Tighten the test-name grep to require `*testing.T` or `*testing.TB` arg, dropping the TestMain false-positive. - Force LC_ALL=C sort so the partition is deterministic regardless of the runner's locale. - Empty assignment now exits 1 instead of silently passing. find-gogit-test-pkgs.sh: - Mirror the Makefile's GO_TEST_PACKAGES exclusions (drop models/migrations/..., tests/integration, tests/integration/migration-test). - Tighten the comment header — the script returns packages WITH TESTS, not all callers. - Drop dead alternatives from the import-match regex. Makefile test-backend-gogit: - Fail when the script fails or returns no packages instead of silently running `go test` with no args. Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
Replaces test-unit-nogogit + test-unit-gogit with a single test-unit-shards matrix (3-shard). Each shard runs both the bindata and bindata-gogit test subsets — round-robin partition of GO_TEST_PACKAGES (123 each) and find-gogit-test-pkgs.sh (22-23 each). Combined work split 3 ways gives each shard ~7:46 wall (vs 10:55/7:38 today). PRs no longer write rotated unit caches: build-cache-rotate is dropped, so the shared seeded gobuild key is restored but not re-saved per PR push. Trade-off: cold testcache on every push (vs warm-on-rerun before). Frees ~3 GB of rotated-cache pressure on the 10 GB cap. Unit shards swap `make backend` for `make generate-go` — only the bindata codegen is needed; the gitea executable's link step (~10-15s) is wasted on unit tests since they don't shell out to the binary (db integration tests do — those keep `make backend`). New shared tools/partition-by-shard.sh handles validation + round-robin partitioning; tools/test-integration-shard.sh now uses it. New Makefile targets: test-backend-shard, test-backend-gogit-shard. Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
The 3-shard combined unit run came in at 11:42 wall (slowest shard), slightly worse than the prior parallel-jobs setup (10:55). The per-shard race-instrumented compile happens cold for each job and doesn't shard, so per-shard test work was ~67% higher than a clean 3-way split would suggest. N=4 amortizes that overhead across more runners. Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
This comment was marked as low quality.
This comment was marked as low quality.
- partition-by-shard.sh: fix off-by-one. `NR % t == r` placed item 1
in shard `total`, item 2 in shard 1 — shifted by one. Switch to
`(NR-1) % t == r` so shard 1 owns items 1, 1+t, 1+2t, ... as
documented.
- test-integration-shard.sh: the `[A-Z]` after `Test` rejected
`Test_Foo`-style names (28 such tests in tests/integration). Drop
the [A-Z] and broaden the comment to match the actual regex.
- test-integration-shard.sh: drop `*testing.TB` alternative — there
are zero matching tests in this codebase, so the simpler regex is
exact.
- find-gogit-test-pkgs.sh: merge the `grep -vE` exclusion into the
awk filter. Avoids a pipefail edge case where empty grep output
would abort the pipeline before the friendly empty-list message
fires; also one fewer process per invocation.
- Makefile test-integration: fail fast at Make level if TEST_SHARD
is set without TEST_TOTAL_SHARDS, instead of letting the script's
${X:?...} surface a less obvious error.
All locals in the new bash scripts use UPPER_CASE to match the
existing tools/test-e2e.sh convention.
Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
|
Sharding is standard practice for big repos, long list is inconsequential. What exactly is "horrible" here? |
I never said that, I just said useless skip info is noisy. |
Revert the cosmetic rename "unit tests (bindata)" / "unit tests (bindata gogit)" back to "unit-tests" / "unit-tests-gogit" to match main. The structural change — separate steps with a trailing make test-check — stays. Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
ce29d29 to
1c1de6a
Compare
|
lint fix extracted to #37761. |
* origin/main: chore(deps): update action dependencies (go-gitea#37751) # Conflicts: # .github/workflows/pull-e2e-tests.yml
|
this needs then a new required status check set? |
Yes, I will update required checks right before merge so they go in effect immediately after this is merged. |
You have the perms? |
|
Yes, recently got them, specifically for this reason. |
Congrats 🎉 |
|
There is one problem, which is arguably a GitHub bug: Skipped matrix runs do not report with their matrix names but with the job name and required checks can never match both variants. I will temporarily drop the psql from branch protection and make a PR to fix it. |
Do we have the same bug in Gitea 🤔 |
|
Might be worth an investigation. I have not used matrix+required checks in gitea yet. What is clearly lacking in GitHub at least is that one can not pattern-match on required checks, this would be very nice to have here to be able to do |
|
followup: #37802 |
* origin/main: style: misc UI fixes (go-gitea#37691) ci: shard tests and reduce redundant work (go-gitea#37618) chore: simplify issue and pull request templates (go-gitea#37799) # Conflicts: # web_src/js/components/ActionRunJobView.vue
* main: (70 commits) fix(api): handle partial failures in push mirror synchronization gracefully (go-gitea#37782) fix(deps): update module golang.org/x/crypto to v0.52.0 [security] (go-gitea#37806) fix(build): swagger css import (go-gitea#37801) feat: add copy button to action step header, improve other copy buttons (go-gitea#37744) style: misc UI fixes (go-gitea#37691) ci: shard tests and reduce redundant work (go-gitea#37618) chore: simplify issue and pull request templates (go-gitea#37799) chore: Update 1.26.2 changelog in main (go-gitea#37796) fix(actions): make artifact signature payloads unambiguous (go-gitea#37707) chore: Update giteabot to fix failure when backport (go-gitea#37789) fix(deps): update module github.com/go-git/go-git/v5 to v5.19.1 [security] (go-gitea#37786) fix(pull): handle empty pull request files view to allow reviews (go-gitea#37783) fix(markup): make RenderString never fail (go-gitea#37779) fix(markup): wrap indented code blocks for the code-copy button (go-gitea#37748) fix(permissions): Fix reading permission (go-gitea#37769) fix: add natural sort to sortTreeViewNodes (go-gitea#37772) fix: package creation unique conflict (go-gitea#37774) fix(deps): update npm dependencies (go-gitea#37768) fix(deps): update module gitlab.com/gitlab-org/api/client-go/v2 to v2.26.0 (go-gitea#37771) ci: split giteabot workflow (go-gitea#37770) ...
Critical path ~25:42 → ~19:56 (−22%), ~0% CI minutes.
test-pgsqlshards 2-way. Branch protection: replacetest-pgsqlwithtest-pgsql-shards (1)+test-pgsql-shards (2);test-unit, sqlite/mysql/mssql unchanged — pgsql dominates the critical path.test-unitrunsbindatathenbindata gogitsequentially. cache-seeder pre-warms the race-instrumented test compile cache and the integration test binary so PR jobs warm-start.actions/cache/restore. Defends against PR cache poisoning and frees the 10 GB cap from PR churn.go-cacheaction: dropped thecache-nameinput. One gobuild cache, one golangci-lint cache. Seeder lint job restores but doesn't save gobuild, so only one writer populates it.tools/test-integration.shshards the integration binary via-test.list;TestMainshort-circuits DB init in list mode.TestAPILFSNotStarted/TestAPILFSLocksNotStartedswitched totest.MockVariableValue— latentsetting.LFS.StartServerglobal-state leak uncovered by sharding.This PR was written with the help of Claude Opus 4.7