Summary
TestLeaderBalancedNodeAdded in tests/balancer intermittently hangs and exceeds the 10-minute test timeout, causing the whole tests/balancer package to fail. This is a recurrence of #936, which was previously closed.
Failing CI run
Failure excerpt
panic: test timed out after 10m0s
running tests:
TestLeaderBalancedNodeAdded (9m14s)
goroutine 1 [chan receive, 9 minutes]:
testing.(*T).Run(0xc0003be248, ...)
Dumped goroutines show multiple streamReader[...].handleServerMessageOnce goroutines parked in gRPC RecvMsg for ~9 minutes (in oxiad/dataserver/assignment/stream_reader.go:70), suggesting the test setup reached a state where shard-assignment streams stop progressing but the test keeps waiting.
Repro
Not reproducible locally so far; appears only under CI load. Retrying the job typically succeeds.
Suspected area
tests/balancer/leader_balancer_test.go — the test's assert.Eventually/wait loop for balanced leader distribution can deadlock against a stuck assignment stream.
Related
Summary
TestLeaderBalancedNodeAddedintests/balancerintermittently hangs and exceeds the 10-minute test timeout, causing the wholetests/balancerpackage to fail. This is a recurrence of #936, which was previously closed.Failing CI run
Failure excerpt
Dumped goroutines show multiple
streamReader[...].handleServerMessageOncegoroutines parked in gRPCRecvMsgfor ~9 minutes (inoxiad/dataserver/assignment/stream_reader.go:70), suggesting the test setup reached a state where shard-assignment streams stop progressing but the test keeps waiting.Repro
Not reproducible locally so far; appears only under CI load. Retrying the job typically succeeds.
Suspected area
tests/balancer/leader_balancer_test.go— the test'sassert.Eventually/wait loop for balanced leader distribution can deadlock against a stuck assignment stream.Related