feat(test runner): improve sharding algorithm to better spread similar tests among shards #30962

muhqu · 2024-05-22T15:02:25Z

Adds alternative algorithms to assign test groups to shards to better distribute tests.

Problem

Currently the way sharding works is something like this…

         [  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12]
Shard 1:  ^---------^                                      : [  1, 2, 3 ]
Shard 2:              ^---------^                          : [  4, 5, 6 ]
Shard 3:                          ^---------^              : [  7, 8, 9 ]
Shard 4:                                      ^---------^  : [ 10,11,12 ]

Tests are ordered in the way they are discovered, which is mostly alphabetically. This has the effect that test cases are sorted nearby similar tests… for example your have first 6 tests which are testing logged-in state and then 6 tests which test logged-out state. The first 6 tests require more setup time as they are testing logged-in behaviour… With the current sharding algorithm shard 1 & 2 get those slow logged-in tests and shard 3 & 4 get the more quicker tests…

Solution

This PR adds a new shardingMode configuration which allows to specify the sharding algorithm to be used…

`shardingMode: 'partition'`

That's the current behaviour, which is the default. Let me know if you have a better name to describe the current algorithm...

`shardingMode: 'round-robin'`

Distribute the test groups more evenly. It…

sorts test groups by number of tests in descending order
then loops through the test groups and assigns them to the shard with the lowest number of tests.

Here is a simple example where every test group represents a single test (e.g. --fully-parallel) ...

         [  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12]
Shard 1:    ^               ^               ^              : [  1, 5, 9 ]
Shard 2:        ^               ^               ^          : [  2, 6,10 ]
Shard 3:            ^               ^               ^      : [  3, 7,11 ]
Shard 4:                ^               ^               ^  : [  4, 8,12 ]

…or a more complex scenario where test groups have different number of tests…

Original Order: [ [1], [2, 3], [4, 5, 6], [7], [8], [9, 10], [11], [12] ]
Sorted Order:   [ [4, 5, 6], [2, 3], [9, 10], [1], [7], [8], [11], [12] ]
Shard 1:           ^-----^                                                : [ [ 4,   5,   6] ]
Shard 2:                      ^--^                       ^                : [ [ 2,  3],  [8] ]
Shard 3:                              ^---^                    ^          : [ [ 9, 10], [11] ]
Shard 4:                                       ^    ^                ^    : [ [1], [7], [12] ]

`shardingMode: 'duration-round-robin'`

It's very similar to round-robin, but it uses the duration of a tests previous run as cost factor. The duration will be read from .last-run.json when available. When a test can not be found in .last-run.json it will use the average duration of available tests. When no last run info is available, the behaviour would be identical to round-robin.

Other changes

Add testDurations?: { [testId: string]: number } to .last-run.json
Add builtin lastrun reporter, which allows merge-reports to generate a .last-run.json to be generated

Appendix

Below are some runtime stats from a project I've been working on, which shows the potential benefit of this change.

The tests runs had to complete 161 tests. Single test duration ranges from a few seconds to over 2 minutes.

The partition run gives the baseline performance and illustrates the problem quite good. We have a single shard that takes almost 16 min while another one completes in under 5 min.

The round-robin algorithm gives a bit better performance, but it still has a shard that requires twice the time of another shard.

The duration-round-robin run was using the duration info from a previous run and achieves the best result by far. All shards complete in 10-11 minutes. 🏆 🎉

muhqu · 2024-05-22T15:11:53Z

Maybe it's better to make this an option to allow restoring the old behaviour. ¯_(ツ)_/¯

~~And… there should be unit-tests, no?~~ found them…

pavelfeldman · 2024-05-22T23:46:17Z

Do you think you can achieve the same better behavior with your sharding seed? Or are you looking for additional bias against subsequent tests being put into the same group?

muhqu · 2024-05-23T05:45:19Z

Do you think you can achieve the same better behavior with your sharding seed?

Not sure yet. But I will test this new sharding logic in our test setup to gather some results.

Or are you looking for additional bias against subsequent tests being put into the same group?

The seeded shuffle is basically just a quick and easy way to influence the test group to shard assignment… it's random and so it's results may vary.

However, this change is aimed to improve the sharding logic to generally yield better results, which yet needs to be proved. 😅

Currently this sharding algorithm uses the number of tests per test group as a cost metric. It would be great if we could use the test duration of a previous run (when available) to even better distribute the tests among the shards. But the algorithm would be quite similar.

pavelfeldman · 2024-05-24T15:36:32Z

I think your seed change allows users to experiment with the seeds and arrive at some better state than they are today. Any other changes w/o the timing feedback are going to yield similar results, not need to experiment with biases.

It would be great if we could use the test duration of a previous run (when available) to even better distribute the tests among the shards.

This requires a feedback loop with the test time stats which we don't have today. We recently started storing the last run stats in .last-run.json, I think it is Ok to store the test times there and to use it on the subsequent runs for better sharding, if it is available. Would you be interested in working on it?

muhqu · 2024-05-24T15:46:49Z

Yes, I would like to work on that.

I was not yet aware of the .last-run.json. Is that something that is also written by the merge reports command? Because we need the stats combined from all shard runs.

I was thinkings about adding a separate reporter for that purpose, but if those last run stats are already there…, then there might not be the need to create a separate reporter.

pavelfeldman · 2024-05-24T17:02:11Z

I was thinkings about adding a separate reporter for that purpose, but if those last run stats are already there…, then there might not be the need to create a separate reporter.

Shaping this code as reporter sounds good, but Playwright core would need to consume the output of that reporter, so it needs to be baked in. Merging those should not be a hard problem, reporter or not. Unfortunately merging mangles test ids today, so we'd need to figure that out. Maybe not using the ids altogether and falling back to the file names and test titles. Also has some tricky edge cases as tests that are fast on Chromium and are slow on Firefox...

muhqu · 2024-05-27T09:31:15Z

I've added a lastrun reporter that can be used with merge-reports to generate .last-run.json.

Surprisingly when merging the reports the test ids just had a 1 character suffix that I was able to strip off… but it doesn't feel like the right way to do this.

What's the reason to modify test ids when merging blobs? Couldn't this be done in a way that only modifies a test id when there is a collision?

packages/playwright/src/runner/runner.ts

muhqu · 2024-05-28T16:32:23Z

@pavelfeldman last-run.json is a little bit tricky to work with atm… it constantly gets overwritten even if you just list tests and VS Code seems to do that from time to time to refresh the tests panel... I think it should only get written when it's actually running tests.

muhqu · 2024-09-11T17:12:11Z

hm… okay, now there are a bunch of conflicts due to @dgozman 's #32540 (chore(test runner): extract LastRunReporter)

I'll see if I can adapt to that…

github-actions · 2024-09-11T18:57:41Z

Test results for "tests 1"

1 failed
❌ [playwright-test] › babel.spec.ts:135:5 › should not transform external

2 flaky

⚠️ [firefox-page] › page/workers.spec.ts:119:3 › should clear upon cross-process navigation
⚠️ [webkit-library] › library/download.spec.ts:698:3 › should convert navigation to a resource with unsupported mime type into download

34211 passed, 639 skipped
✔️✔️✔️

Merge workflow run.

github-actions · 2024-09-11T19:05:32Z

Test results for "tests 1"

6459 passed, 20 skipped
✔️✔️✔️

Merge workflow run.

github-actions · 2024-09-11T19:44:19Z

Test results for "tests 1"

1 failed
❌ [playwright-test] › babel.spec.ts:135:5 › should not transform external

35497 passed, 660 skipped
✔️✔️✔️

Merge workflow run.

muhqu · 2024-09-12T09:49:00Z

Btw, what's the reason for playwright merge-reports command to be hidden? I mean why is it not listed when running playwright --help?
\cc @yury-s @aslushnikov

github-actions · 2024-09-12T10:17:29Z

Test results for "tests 1"

1 failed
❌ [playwright-test] › babel.spec.ts:135:5 › should not transform external

35497 passed, 660 skipped
✔️✔️✔️

Merge workflow run.

…l.spec.ts

github-actions · 2024-09-12T11:06:25Z

Test results for "tests 1"

1 flaky

⚠️ [playwright-test] › ui-mode-test-setup.spec.ts:98:5 › should show errors in config

35496 passed, 661 skipped
✔️✔️✔️

Merge workflow run.

muhqu · 2024-09-13T09:55:27Z

~~Here is a separate PR to silence the babel.spec.ts node.js issue : #32604~~
Closed as @mxschmitt noted:

Its tracked by #32311 and seems fixed upstream - once they issue a new 22.x release it should pass again if I see it correctly.

…al' babel.spec.ts" This reverts commit 412e05b.

github-actions · 2024-09-13T10:08:30Z

Test results for "tests 1"

6558 passed, 21 skipped
✔️✔️✔️

Merge workflow run.

github-actions · 2024-09-13T10:48:28Z

Test results for "tests 1"

1 failed
❌ [playwright-test] › babel.spec.ts:135:5 › should not transform external

35504 passed, 659 skipped
✔️✔️✔️

Merge workflow run.

liviucmg · 2024-09-13T18:13:33Z

@muhqu We have about 400 tests split into 32 shards. Here's a sample before-and-after using duration-round-robin, in which the longest shard is reduced from ~23 minutes to ~16 minutes. It still fluctuates due to other factors, of course, like flaky tests, but overall amazing job! 👏

github-actions · 2024-09-16T11:10:42Z

Test results for "tests 1"

2 failed
❌ [playwright-test] › runner.spec.ts:118:5 › should ignore subprocess creation error because of SIGINT
❌ [playwright-test] › babel.spec.ts:135:5 › should not transform external

35520 passed, 659 skipped
✔️✔️✔️

Merge workflow run.

github-actions · 2024-09-17T08:42:09Z

Test results for "tests 1"

3 failed
❌ [playwright-test] › runner.spec.ts:118:5 › should ignore subprocess creation error because of SIGINT
❌ [installation tests] › playwright-electron-should-work.spec.ts:44:5 › should work when wrapped inside @playwright/test and trace is enabled
❌ [playwright-test] › babel.spec.ts:135:5 › should not transform external

1 flaky

⚠️ [playwright-test] › ui-mode-test-watch.spec.ts:145:5 › should watch all

35554 passed, 659 skipped
✔️✔️✔️

Merge workflow run.

github-actions · 2024-09-17T16:17:51Z

Test results for "tests 1"

16 failed
❌ [playwright-test] › babel.spec.ts:135:5 › should not transform external
❌ [chromium] › components/codeMirrorWrapper.spec.tsx:71:5 › highlight JavaScript
❌ [chromium] › components/codeMirrorWrapper.spec.tsx:76:5 › highlight Python
❌ [chromium] › components/codeMirrorWrapper.spec.tsx:81:5 › highlight Java
❌ [chromium] › components/codeMirrorWrapper.spec.tsx:86:5 › highlight C#
❌ [chromium] › components/codeMirrorWrapper.spec.tsx:91:5 › highlight lines
❌ [chromium] › components/expandable.spec.tsx:22:5 › should render collapsed
❌ [chromium] › components/expandable.spec.tsx:29:5 › should render expanded
❌ [chromium] › components/expandable.spec.tsx:36:5 › click should expand
❌ [chromium] › components/splitView.spec.tsx:22:5 › should render
❌ [chromium] › components/splitView.spec.tsx:35:5 › should render sidebar first
❌ [chromium] › components/splitView.spec.tsx:49:5 › should render horizontal split
❌ [chromium] › components/splitView.spec.tsx:64:5 › should hide sidebar
❌ [chromium] › components/splitView.spec.tsx:77:5 › drag resize
❌ [chromium] › shared/imageDiffView.spec.tsx:37:5 › should render links
❌ [chromium] › shared/imageDiffView.spec.tsx:46:5 › should show diff by default

2 flaky

⚠️ [firefox-page] › page/page-event-request.spec.ts:169:3 › should return response body when Cross-Origin-Opener-Policy is set
⚠️ [webkit-library] › library/download.spec.ts:712:3 › should download links with data url

35562 passed, 659 skipped
✔️✔️✔️

Merge workflow run.

github-actions · 2024-09-17T17:48:34Z

Test results for "tests 1"

1 failed
❌ [playwright-test] › babel.spec.ts:135:5 › should not transform external

3 flaky

⚠️ [playwright-test] › ui-mode-test-ct.spec.ts:59:5 › should run component tests after editing test
⚠️ [webkit-library] › library/download.spec.ts:698:3 › should convert navigation to a resource with unsupported mime type into download
⚠️ [webkit-page] › page/workers.spec.ts:243:3 › should support offline

35571 passed, 659 skipped
✔️✔️✔️

Merge workflow run.

muhqu · 2024-09-18T19:49:19Z

@dgozman @pavelfeldman how is it? …do you have an idea when you get time to review this PR again? Would love to get this finished…

muhqu mentioned this pull request May 22, 2024

feat(test runner): shuffle order of tests with sharding seed #30817

Merged

This comment has been minimized.

Sign in to view

muhqu added 8 commits May 27, 2024 07:05

sharding algorithm to better spread similar tests among shards

54bbba8

improve shard algorithm by sorting test groups by number of tests

ee64b15

adjust shard.spec.ts to new algorithm

6543baf

fix reporter-json.spec.ts due to sharding

59c625e

fix reporter-junit.spec.ts due to sharding

d3b4fad

empty commit to trigger ci

4bcfc78

add test durations to last run info

061f559

allow .last-run.json to be generated via merge-reports

14ca30a

muhqu force-pushed the sharding-algorithm branch from 82809eb to 14ca30a Compare May 27, 2024 08:50

This comment has been minimized.

Sign in to view

muhqu added 2 commits May 27, 2024 15:47

Add shardingMode configuration

ce09f88

revert json/junit test changes

d42c499

This comment has been minimized.

Sign in to view

empty commit to trigger ci

b5b8174

This comment has been minimized.

Sign in to view

muhqu commented May 28, 2024

View reviewed changes

packages/playwright/src/runner/runner.ts Outdated Show resolved Hide resolved

fix --last-run-file parameter

16b39b8

muhqu added 3 commits September 11, 2024 19:19

Merge branch 'main' into sharding-algorithm

ea72517

adapt LastRunReporter

b78b88f

use separate LastRunReporter instances

f62651c

allow lastRun.ts to be imported by merge.ts

07914f7

Merge branch 'main' into sharding-algorithm

e7d07bc

Add test.skip for Node.js bug in 'should not transform external' babe…

412e05b

…l.spec.ts

Merge branch 'main' into sharding-algorithm

171c5e1

mxschmitt mentioned this pull request Sep 13, 2024

chore: unhide merge-reports command #32605

Merged

Revert "Add test.skip for Node.js bug in 'should not transform extern…

ad6af69

…al' babel.spec.ts" This reverts commit 412e05b.

Merge branch 'main' into sharding-algorithm

b3b568b

Merge branch 'main' into sharding-algorithm

36433d0

Merge branch 'main' into sharding-algorithm

ae34689

empty commit to trigger CI

4dd2842

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(test runner): improve sharding algorithm to better spread similar tests among shards #30962

feat(test runner): improve sharding algorithm to better spread similar tests among shards #30962

muhqu commented May 22, 2024 •

edited

Loading

muhqu commented May 22, 2024 •

edited

Loading

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

pavelfeldman commented May 22, 2024

muhqu commented May 23, 2024 •

edited

Loading

pavelfeldman commented May 24, 2024

muhqu commented May 24, 2024

pavelfeldman commented May 24, 2024

This comment has been minimized.

muhqu commented May 27, 2024

This comment has been minimized.

This comment has been minimized.

muhqu commented May 28, 2024

muhqu commented Sep 11, 2024

github-actions bot commented Sep 11, 2024

github-actions bot commented Sep 11, 2024

github-actions bot commented Sep 11, 2024

muhqu commented Sep 12, 2024

github-actions bot commented Sep 12, 2024

github-actions bot commented Sep 12, 2024

muhqu commented Sep 13, 2024 •

edited

Loading

github-actions bot commented Sep 13, 2024

github-actions bot commented Sep 13, 2024

liviucmg commented Sep 13, 2024

github-actions bot commented Sep 16, 2024

github-actions bot commented Sep 17, 2024

github-actions bot commented Sep 17, 2024

github-actions bot commented Sep 17, 2024

muhqu commented Sep 18, 2024

feat(test runner): improve sharding algorithm to better spread similar tests among shards #30962

Are you sure you want to change the base?

feat(test runner): improve sharding algorithm to better spread similar tests among shards #30962

Conversation

muhqu commented May 22, 2024 • edited Loading

Problem

Solution

shardingMode: 'partition'

shardingMode: 'round-robin'

shardingMode: 'duration-round-robin'

Other changes

Appendix

muhqu commented May 22, 2024 • edited Loading

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

pavelfeldman commented May 22, 2024

muhqu commented May 23, 2024 • edited Loading

pavelfeldman commented May 24, 2024

muhqu commented May 24, 2024

pavelfeldman commented May 24, 2024

This comment has been minimized.

muhqu commented May 27, 2024

This comment has been minimized.

This comment has been minimized.

muhqu commented May 28, 2024

muhqu commented Sep 11, 2024

github-actions bot commented Sep 11, 2024

Test results for "tests 1"

github-actions bot commented Sep 11, 2024

Test results for "tests 1"

github-actions bot commented Sep 11, 2024

Test results for "tests 1"

muhqu commented Sep 12, 2024

github-actions bot commented Sep 12, 2024

Test results for "tests 1"

github-actions bot commented Sep 12, 2024

Test results for "tests 1"

muhqu commented Sep 13, 2024 • edited Loading

github-actions bot commented Sep 13, 2024

Test results for "tests 1"

github-actions bot commented Sep 13, 2024

Test results for "tests 1"

liviucmg commented Sep 13, 2024

github-actions bot commented Sep 16, 2024

Test results for "tests 1"

github-actions bot commented Sep 17, 2024

Test results for "tests 1"

github-actions bot commented Sep 17, 2024

Test results for "tests 1"

github-actions bot commented Sep 17, 2024

Test results for "tests 1"

muhqu commented Sep 18, 2024

muhqu commented May 22, 2024 •

edited

Loading

`shardingMode: 'partition'`

`shardingMode: 'round-robin'`

`shardingMode: 'duration-round-robin'`

muhqu commented May 22, 2024 •

edited

Loading

muhqu commented May 23, 2024 •

edited

Loading

muhqu commented Sep 13, 2024 •

edited

Loading