-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MINOR Add a thread dump on build timeout #17181
Conversation
@mjsax I think this can help us debug StreamThreadTest |
From the raw logs of https://github.com/apache/kafka/actions/runs/10835121701?pr=17181 We see the thread dump script start and Gradle start:
5 minutes later, we see the threads dumped
and 5 more minutes later, we see the job killed by the |
This reverts commit ae31b00.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mumrah thanks for this patch!
.github/scripts/thread-dump.sh
Outdated
for GRADLE_WORKER_PID in `jps | grep GradleWorkerMain | awk -F" " '{print $1}'`; | ||
do | ||
echo "Thread Dump for GradleWorkerMain pid $GRADLE_WORKER_PID"; | ||
kill -3 $GRADLE_WORKER_PID; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we dump the processes only if they have keyword "kafka"? That can avoid unnecessary dump I'd say
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my knowledge, the JUnit tests are all run on the GradleWorkerMain java processes. Here's a jps
during a local test run
66818 GradleWorkerMain
66884 GradleWorkerMain
18725 GradleDaemon
66631 GradleWorkerMain
66886 FindBugs2
66889 FindBugs2
65804 GradleWrapperMain
66924 Jps
66862 GradleWorkerMain
66798 GradleWorkerMain
66864 GradleWorkerMain
66899 Main
66841 GradleWorkerMain
13976 Main
66811 GradleWorkerMain
65754 GradleDaemon
65917 GradleWorkerMain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for unclear comment :(
My point was we should print the thread dump only if they have kafka-related stack. The thread dump of this PR have two unrelated processes (GradleWorkerMain).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can we do that? grep for "kafka" in the thread dump after the fact? I'm not sure we could figure this out prior to generating the dump
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can we do that? grep for "kafka" in the thread dump after the fact? I'm not sure we could figure this out prior to generating the dump
"grep" seems to be a acceptable way. we could call jstack
to check it before archiving it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mumrah thanks for this patch
BTW, could you please add compression-level: 9
to "Archive JUnit reports" task? the archive are plaintext so we should compress them to save cost.
.github/workflows/build.yml
Outdated
@@ -141,3 +144,11 @@ jobs: | |||
with: | |||
name: build-scan-test-${{ matrix.java }} | |||
path: ~/.gradle/build-scan-data | |||
- name: Archive Thread Dumps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the artifact url generated by this task should be passed to "Parse JUnit tests". Than can enable "Parse JUnit tests" to add url to summary (like junit reports)
@chia7712 here's a run with the latest changes https://github.com/apache/kafka/actions/runs/10850348751?pr=17181 The "Parse JUnit Tests" step produces the following stdout:
The thread dump archive is also in the job summary https://github.com/apache/kafka/actions/runs/10850348751?pr=17181#summary-30111636589 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mumrah thanks for updated patch. overall LGTM
To help with debugging build timeouts, this patch adds a thread dump script to run in parallel with the JUnit tests. 5 minutes prior to the build timeout of 3 hours, this script will iteratively run
jstack
to the Gradle worker processes in order to obtain thread dumps.