Skip to content

ci: shard tests and reduce redundant work#37618

Merged
silverwind merged 46 commits into
go-gitea:mainfrom
silverwind:ci-cache-improvements
May 21, 2026
Merged

ci: shard tests and reduce redundant work#37618
silverwind merged 46 commits into
go-gitea:mainfrom
silverwind:ci-cache-improvements

Conversation

@silverwind

@silverwind silverwind commented May 8, 2026

Copy link
Copy Markdown
Member

Critical path ~25:42 → ~19:56 (−22%), ~0% CI minutes.

  • test-pgsql shards 2-way. Branch protection: replace test-pgsql with test-pgsql-shards (1) + test-pgsql-shards (2); test-unit, sqlite/mysql/mssql unchanged — pgsql dominates the critical path.
  • test-unit runs bindata then bindata gogit sequentially. cache-seeder pre-warms the race-instrumented test compile cache and the integration test binary so PR jobs warm-start.
  • Cache writes restricted to cache-seeder; PR jobs use actions/cache/restore. Defends against PR cache poisoning and frees the 10 GB cap from PR churn.
  • go-cache action: dropped the cache-name input. One gobuild cache, one golangci-lint cache. Seeder lint job restores but doesn't save gobuild, so only one writer populates it.
  • tools/test-integration.sh shards the integration binary via -test.list; TestMain short-circuits DB init in list mode.

TestAPILFSNotStarted / TestAPILFSLocksNotStarted switched to test.MockVariableValue — latent setting.LFS.StartServer global-state leak uncovered by sharding.


This PR was written with the help of Claude Opus 4.7

- e2e and checks-backend: enable build-cache restore so they hit the
  seeded gobuild cache.
- pgsql/sqlite/mysql/mssql: replace `make backend` with
  `make generate-go`. Integration tests build their own test binary
  and never invoke the gitea executable; only bindata generation is
  needed.
- unit-tests-gogit: narrow to packages with gogit/nogogit-tagged
  files via a new `test-backend-gogit` Makefile target. Other
  packages produce identical compiled output regardless of the gogit
  tag, so retesting them was busywork.
- cache-seeder: stop the lint job from competing with the gobuild
  job on the shared non-rotated gobuild key.

Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label May 8, 2026
The previous commit replaced `make backend` with `make generate-go`
on the assumption that the gitea executable was unused by the
integration tests. It is used: integration tests install git
pre-receive hooks that shell out to the binary, so `git push`
operations during tests fail with "No such file or directory".

Reverts that part of the previous change; the other cache-reuse
tweaks remain.

Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes PR CI runtime by improving Go build-cache reuse across jobs and reducing redundant -tags gogit unit test work.

Changes:

  • Enable Go build-cache restore in test-e2e and checks-backend by removing build-cache: "false".
  • Add make test-backend-gogit to run go test over a smaller set of packages for the gogit pass.
  • Prevent the cache-seeder’s lint job from contending on the shared non-rotated gobuild cache key by disabling build-cache in that job.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
Makefile Adds GO_GOGIT_TEST_PACKAGES and test-backend-gogit target for a reduced gogit unit-test pass.
.github/workflows/pull-e2e-tests.yml Enables restoring the Go build cache for the e2e workflow job.
.github/workflows/pull-db-tests.yml Switches unit-tests-gogit to the reduced test-backend-gogit make target.
.github/workflows/pull-compliance.yml Enables restoring the Go build cache for the backend checks job.
.github/workflows/cache-seeder.yml Disables build-cache in seeder lint job to avoid cache-key contention with seeder gobuild job.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/pull-db-tests.yml Outdated
silverwind and others added 2 commits May 9, 2026 01:35
Splits each database integration job (pgsql/sqlite/mysql/mssql) into
two parallel matrix shards. Test names are enumerated from the
integration source and partitioned round-robin (~301/302 of 603
tests per shard); names that don't match the shard are filtered out
via -test.run.

Migration tests (~50-90 s, fast, sequential) only run on shard 1.

The original job names (test-pgsql, test-sqlite, test-mysql,
test-mssql) are kept as one-step aggregator jobs that depend on the
shards job and report success only when all shards passed. This
keeps any branch-protection rule referencing those names valid.

Source-based enumeration is used because the test binary's
-test.list calls TestMain, which boots the full Gitea environment
and panics without a configured database.

Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
Replaces the hardcoded list of three top-level paths
(modules/git, modules/gitrepo, modules/lfs) with an at-build-time
derivation. tools/find-gogit-test-pkgs.sh queries `go list` for
every package whose own code or test code imports any of the
gogit-affected modules — currently 72 packages — and feeds that
list into `make test-backend-gogit`.

This restores coverage that the static list dropped: callers like
models/git, services/repository, modules/repository, etc., whose
unit tests transitively exercise gogit/nogogit code paths via the
modules/git API.

Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
@silverwind silverwind changed the title ci: improve cache reuse and drop redundant build work ci: shard integration tests and reduce redundant work May 8, 2026
silverwind and others added 2 commits May 9, 2026 01:49
- pgsql/sqlite/mysql/mssql: increase shards from 2 to 3, dropping
  per-job wall time from ~13 min to ~9-10 min on the slowest (pgsql).
- test-unit: split into test-unit-bindata and test-unit-gogit running
  in parallel, with a test-unit aggregator preserving the existing
  check name. Each gets its own rotated cache (cache-name unit and
  unit-gogit) so testcache accumulates per variant.

Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
…t pass

Two CI failures from sharding/splitting:

1. TestAPILFSNotStarted and TestAPILFSLocksNotStarted set
   setting.LFS.StartServer = false but never restore it. In the
   pre-sharding sequential run a later test happened to set it back
   to true, but with sharding TestLFSRender and
   TestChangeRepoFilesForUpdateWithFileRename can land in a different
   shard and run with LFS disabled. Use test.MockVariableValue so
   the change is scoped to the test that makes it.

2. find-gogit-test-pkgs.sh enumerated every package whose tests
   import the gogit modules, including tests/integration,
   tests/integration/migration-test, and models/migrations/...
   These need a real DB / dedicated harness and have to be tested
   separately; mirror GO_TEST_PACKAGES' filter to exclude them.

Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Comment thread tools/test-integration-shard.sh Outdated
Comment thread tools/test-integration-shard.sh Outdated
Comment thread tools/find-gogit-test-pkgs.sh Outdated
Comment thread Makefile Outdated
silverwind and others added 3 commits May 9, 2026 05:45
Rename:
- test-unit-bindata -> test-unit-nogogit. Both jobs run with the
  bindata tag; the meaningful difference is the gogit codepath, and
  the codebase already uses the nogogit/gogit suffix on its
  build-tagged source files.

test-unit-gogit cleanup:
- The gogit-affected package set (modules/git/gitrepo/lfs and direct
  importers) doesn't touch elasticsearch/meilisearch/redis/minio/
  azurite, so drop those services and the matching /etc/hosts step.

Shard runner robustness (Copilot review feedback):
- Validate TEST_SHARD/TEST_TOTAL_SHARDS are positive ints with
  shard in [1, total]; exit 2 on bad input.
- Tighten the test-name grep to require `*testing.T` or `*testing.TB`
  arg, dropping the TestMain false-positive.
- Force LC_ALL=C sort so the partition is deterministic regardless
  of the runner's locale.
- Empty assignment now exits 1 instead of silently passing.

find-gogit-test-pkgs.sh:
- Mirror the Makefile's GO_TEST_PACKAGES exclusions (drop
  models/migrations/..., tests/integration, tests/integration/migration-test).
- Tighten the comment header — the script returns packages WITH TESTS,
  not all callers.
- Drop dead alternatives from the import-match regex.

Makefile test-backend-gogit:
- Fail when the script fails or returns no packages instead of
  silently running `go test` with no args.

Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
Replaces test-unit-nogogit + test-unit-gogit with a single test-unit-shards
matrix (3-shard). Each shard runs both the bindata and bindata-gogit test
subsets — round-robin partition of GO_TEST_PACKAGES (123 each) and
find-gogit-test-pkgs.sh (22-23 each). Combined work split 3 ways gives
each shard ~7:46 wall (vs 10:55/7:38 today).

PRs no longer write rotated unit caches: build-cache-rotate is dropped, so
the shared seeded gobuild key is restored but not re-saved per PR push.
Trade-off: cold testcache on every push (vs warm-on-rerun before). Frees
~3 GB of rotated-cache pressure on the 10 GB cap.

Unit shards swap `make backend` for `make generate-go` — only the bindata
codegen is needed; the gitea executable's link step (~10-15s) is wasted on
unit tests since they don't shell out to the binary (db integration tests
do — those keep `make backend`).

New shared tools/partition-by-shard.sh handles validation + round-robin
partitioning; tools/test-integration-shard.sh now uses it. New Makefile
targets: test-backend-shard, test-backend-gogit-shard.

Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
The 3-shard combined unit run came in at 11:42 wall (slowest shard),
slightly worse than the prior parallel-jobs setup (10:55). The
per-shard race-instrumented compile happens cold for each job and
doesn't shard, so per-shard test work was ~67% higher than a clean
3-way split would suggest. N=4 amortizes that overhead across more
runners.

Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
@silverwind silverwind changed the title ci: shard integration tests and reduce redundant work ci: shard tests and reduce redundant work May 9, 2026
@silverwind silverwind requested a review from Copilot May 9, 2026 04:39

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Comment thread tools/partition-by-shard.sh Outdated
Comment thread Makefile Outdated
Comment thread tools/test-integration-shard.sh Outdated
@wxiaoguang

This comment was marked as low quality.

@wxiaoguang wxiaoguang left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Horrible hacky tricks, didn't see real benefit.

@GiteaBot GiteaBot added lgtm/blocked A maintainer has reservations with the PR and thus it cannot be merged and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels May 9, 2026
- partition-by-shard.sh: fix off-by-one. `NR % t == r` placed item 1
  in shard `total`, item 2 in shard 1 — shifted by one. Switch to
  `(NR-1) % t == r` so shard 1 owns items 1, 1+t, 1+2t, ... as
  documented.
- test-integration-shard.sh: the `[A-Z]` after `Test` rejected
  `Test_Foo`-style names (28 such tests in tests/integration). Drop
  the [A-Z] and broaden the comment to match the actual regex.
- test-integration-shard.sh: drop `*testing.TB` alternative — there
  are zero matching tests in this codebase, so the simpler regex is
  exact.
- find-gogit-test-pkgs.sh: merge the `grep -vE` exclusion into the
  awk filter. Avoids a pipefail edge case where empty grep output
  would abort the pipeline before the friendly empty-list message
  fires; also one fewer process per invocation.
- Makefile test-integration: fail fast at Make level if TEST_SHARD
  is set without TEST_TOTAL_SHARDS, instead of letting the script's
  ${X:?...} surface a less obvious error.

All locals in the new bash scripts use UPPER_CASE to match the
existing tools/test-e2e.sh convention.

Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
@silverwind

silverwind commented May 9, 2026

Copy link
Copy Markdown
Member Author

Sharding is standard practice for big repos, long list is inconsequential.

What exactly is "horrible" here?

@silverwind silverwind requested a review from wxiaoguang May 9, 2026 04:54
@silverwind

Copy link
Copy Markdown
Member Author

As you said, the list is too long.

I never said that, I just said useless skip info is noisy.

@GiteaBot GiteaBot added the lgtm/need 1 This PR needs approval from one additional maintainer to be merged. label May 18, 2026
silverwind and others added 2 commits May 18, 2026 10:04
Revert the cosmetic rename "unit tests (bindata)" / "unit tests (bindata
gogit)" back to "unit-tests" / "unit-tests-gogit" to match main. The
structural change — separate steps with a trailing make test-check —
stays.

Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
@silverwind silverwind force-pushed the ci-cache-improvements branch from ce29d29 to 1c1de6a Compare May 18, 2026 08:05
@silverwind

Copy link
Copy Markdown
Member Author

lint fix extracted to #37761.

* origin/main:
  chore(deps): update action dependencies (go-gitea#37751)

# Conflicts:
#	.github/workflows/pull-e2e-tests.yml
@bircni

bircni commented May 20, 2026

Copy link
Copy Markdown
Member

this needs then a new required status check set?

@silverwind

silverwind commented May 21, 2026

Copy link
Copy Markdown
Member Author

this needs then a new required status check set?

Yes, I will update required checks right before merge so they go in effect immediately after this is merged.

@bircni

bircni commented May 21, 2026

Copy link
Copy Markdown
Member

this needs then a new required status check set?

Yes, I will update required checks right before merge so they go in effect immediately after this is merged.

You have the perms?

@silverwind

silverwind commented May 21, 2026

Copy link
Copy Markdown
Member Author

Yes, recently got them, specifically for this reason.

@bircni

bircni commented May 21, 2026

Copy link
Copy Markdown
Member

Yes, recently got them, specifically for this reason.

Congrats 🎉

@GiteaBot GiteaBot added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels May 21, 2026
@silverwind silverwind enabled auto-merge (squash) May 21, 2026 04:33
@silverwind silverwind added the reviewed/wait-merge This pull request is part of the merge queue. It will be merged soon. label May 21, 2026
@silverwind silverwind merged commit 93b8fdc into go-gitea:main May 21, 2026
24 checks passed
@silverwind silverwind deleted the ci-cache-improvements branch May 21, 2026 04:58
@GiteaBot GiteaBot added this to the 1.27.0 milestone May 21, 2026
@GiteaBot GiteaBot removed the reviewed/wait-merge This pull request is part of the merge queue. It will be merged soon. label May 21, 2026
@silverwind

silverwind commented May 21, 2026

Copy link
Copy Markdown
Member Author

There is one problem, which is arguably a GitHub bug: Skipped matrix runs do not report with their matrix names but with the job name and required checks can never match both variants.

I will temporarily drop the psql from branch protection and make a PR to fix it.

@bircni

bircni commented May 21, 2026

Copy link
Copy Markdown
Member

There is one problem, which is arguably a GitHub bug: Skipped matrix runs do not report with their matrix names but with the job name and required checks can never match both variants.

I will temporarily drop the psql from branch protection and make a PR to fix it.

Do we have the same bug in Gitea 🤔

@silverwind

silverwind commented May 21, 2026

Copy link
Copy Markdown
Member Author

Might be worth an investigation. I have not used matrix+required checks in gitea yet.

What is clearly lacking in GitHub at least is that one can not pattern-match on required checks, this would be very nice to have here to be able to do test-psql* for example.

@silverwind

Copy link
Copy Markdown
Member Author

followup: #37802

silverwind added a commit to silverwind/gitea that referenced this pull request May 21, 2026
* origin/main:
  style: misc UI fixes (go-gitea#37691)
  ci: shard tests and reduce redundant work (go-gitea#37618)
  chore: simplify issue and pull request templates (go-gitea#37799)

# Conflicts:
#	web_src/js/components/ActionRunJobView.vue
zjjhot added a commit to zjjhot/gitea that referenced this pull request May 22, 2026
* main: (70 commits)
  fix(api): handle partial failures in push mirror synchronization gracefully (go-gitea#37782)
  fix(deps): update module golang.org/x/crypto to v0.52.0 [security] (go-gitea#37806)
  fix(build): swagger css import (go-gitea#37801)
  feat: add copy button to action step header, improve other copy buttons (go-gitea#37744)
  style: misc UI fixes (go-gitea#37691)
  ci: shard tests and reduce redundant work (go-gitea#37618)
  chore: simplify issue and pull request templates (go-gitea#37799)
  chore: Update 1.26.2 changelog in main (go-gitea#37796)
  fix(actions): make artifact signature payloads unambiguous (go-gitea#37707)
  chore: Update giteabot to fix failure when backport (go-gitea#37789)
  fix(deps): update module github.com/go-git/go-git/v5 to v5.19.1 [security] (go-gitea#37786)
  fix(pull): handle empty pull request files view to allow reviews (go-gitea#37783)
  fix(markup): make RenderString never fail (go-gitea#37779)
  fix(markup): wrap indented code blocks for the code-copy button (go-gitea#37748)
  fix(permissions): Fix reading permission (go-gitea#37769)
  fix: add natural sort to sortTreeViewNodes (go-gitea#37772)
  fix: package creation unique conflict (go-gitea#37774)
  fix(deps): update npm dependencies (go-gitea#37768)
  fix(deps): update module gitlab.com/gitlab-org/api/client-go/v2 to v2.26.0 (go-gitea#37771)
  ci: split giteabot workflow (go-gitea#37770)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. type/testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants