Skip to content

fix(rest): resume paused JSON-shred parser when its channel drains empty (back-pressure deadlock)#1005

Merged
JohannesLichtenberger merged 1 commit into
mainfrom
fix/streaming-shredder-deadlock
Jun 5, 2026
Merged

fix(rest): resume paused JSON-shred parser when its channel drains empty (back-pressure deadlock)#1005
JohannesLichtenberger merged 1 commit into
mainfrom
fix/streaming-shredder-deadlock

Conversation

@JohannesLichtenberger

Copy link
Copy Markdown
Member

Summary

Fixes a back-pressure deadlock in the REST API's KotlinJsonStreamingShredder.

The channel-based shredder runs a producer (Vert.x event loop → bounded channel) and a consumer (worker thread → blocking DB inserts), using parser.pause()/resume() for back-pressure. The consumer resumed the paused parser only after draining resumeThreshold (50,000) events since the last resume. If the consumer drained the channel empty while the parser was still paused and fewer than 50k events had been processed since that resume, it never resumed the parser — so no further events (and never the end-of-stream) arrived. The consumer then blocked forever on the empty channel while the producer stayed paused: a permanent CPU-idle deadlock.

This is reachable whenever an input's post-pause tail is smaller than the resume threshold — directly and deterministically with a small channelCapacity.

Fix

Replace the for (event in channel) iterator (which silently suspends on an empty channel) with an explicit tryReceive()receiveCatching() loop. Before suspending on an empty channel, if the producer is paused it is resumed (low-water fallback). The existing 50k batched resume is retained purely as a throughput optimisation — it lets the producer refill while the consumer drains the remainder.

Tests

  • testBackpressureTailDoesNotDeadlock — deterministic regression: channelCapacity=16 forces a pause whose drain never reaches the 50k threshold; deadlocks (TimeoutException) before the fix, passes after.
  • testProductionScaleChunkedAsync — production-scale (~5000 nested objects / ~150k events) at the default 100k capacity, exercising real back-pressure end-to-end.

All existing JsonStreamingShredderTest sync + async tests continue to pass.

…nnel empty

The streaming JSON shredder's consumer resumed the back-pressured parser
only after draining `resumeThreshold` (50k) events since the last resume.
If the consumer drained the bounded channel empty while still paused and
below that threshold, it never resumed the parser, so no further events
(and never the end-of-stream) arrived: the consumer blocked forever on an
empty channel while the producer stayed paused — a permanent CPU-idle
deadlock for any input whose post-pause tail is smaller than the threshold
(reachable directly with a small channelCapacity).

Replace the `for (event in channel)` iterator with an explicit
tryReceive()/receiveCatching() loop that, before suspending on an empty
channel, resumes the producer when it is paused (low-water fallback). The
50k batched resume is kept purely as a throughput optimisation.

Add a deterministic regression test (channelCapacity=16 → deadlocks before
the fix, passes after) and a production-scale large-payload async test.

Bumps version to 1.0.0-alpha15.
@JohannesLichtenberger JohannesLichtenberger merged commit 18a0ea1 into main Jun 5, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant