fix(inkless:systest): fix sigstop and slow consumer giving false negatives#659
Open
giuseppelillo wants to merge 2 commits into
Open
fix(inkless:systest): fix sigstop and slow consumer giving false negatives#659giuseppelillo wants to merge 2 commits into
giuseppelillo wants to merge 2 commits into
Conversation
…ss tail reads in switch tests
The test passed KafkaService.java_class_name() (regex kafka\.Kafka) to Trogdor's ProcessStopFaultSpec, but Trogdor's worker matches the target JVM by literal substring against jcmd -l. The escaped form never matched the real kafka.Kafka line, so SIGSTOP/SIGCONT were sent to zero pids and the leader was never actually frozen — the scenario passed without testing anything. Fix by passing the literal main-class name (kafka.Kafka) so the signal reaches the broker, and verify the fault actually took effect: assert the broker JVM reaches ps state T (stopped) during the pause and returns to running after SIGCONT, so any future no-op fails loudly instead of silently exercising nothing.
d270c07 to
efc2498
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR hardens the inkless classic→diskless topic switch system test to reduce false negatives by (1) making the “consume exact count” path resilient to temporarily slow diskless fetch tails and (2) ensuring the SIGSTOP-based leader fault injection actually stops (and resumes) the broker JVM.
Changes:
- Increase/parameterize console-consumer idle timeout for exact-count reads and adjust completion logic in
_consume_all_from_beginning. - Fix Trogdor SIGSTOP targeting by using a literal
jcmd -lmatch string for the broker process. - Add verification helpers to assert the broker actually enters stopped (
psstateT) and later resumes after SIGCONT in thesigstopscenario.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+435
to
+438
| # For an exact-count read, keep the consumer alive across a slow diskless | ||
| # tail; a short idle timeout is fine when the caller only wants a minimum. | ||
| consumer_idle_ms = (self.CONSUME_COMPLETION_IDLE_SEC * 1000 | ||
| if wait_for_completion else 30000) |
Comment on lines
+460
to
466
| # Done as soon as every expected record has been delivered. | ||
| if len(consumer.messages_consumed[1]) >= expected_count: | ||
| return True | ||
| # The consumer drained and exited on its own short of the expected | ||
| # count: stop waiting so the caller sees the shortfall (genuine data | ||
| # loss) instead of blocking until timeout_sec. | ||
| return consumer_seen_alive[0] and not is_alive |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Commit 1: avoid false negatives due to slow consumers
Commit 2: actually send SIGSTOP to the broker and verify that it really stops