Skip to content

Limit cleanup loop in test_tbb_fork to 5 iterations.#2041

Open
petterreinholdtsen wants to merge 1 commit into
uxlfoundation:masterfrom
petterreinholdtsen:test-fork-no-endless-loop
Open

Limit cleanup loop in test_tbb_fork to 5 iterations.#2041
petterreinholdtsen wants to merge 1 commit into
uxlfoundation:masterfrom
petterreinholdtsen:test-fork-no-endless-loop

Conversation

@petterreinholdtsen

@petterreinholdtsen petterreinholdtsen commented Apr 1, 2026

Copy link
Copy Markdown
Contributor

Description

There is no reason why the loop to clean up any pending signals should need more than two iterations to complete any SIGALRM and SIGCHLD remaining. Break the loop after 5 iterations and abort if it ever take this long, and sleep for one second every iteration to give even slow machines time to process the child.

On GNU Hurd, the loop some times got stuck (around 1/3 of runs during my testing) in a endless stream of SIGCHLD signals. This code change ensure the test exits with an error instead of getting stuck forever.

Type of change

Choose one or multiple, leave empty if none of the other choices apply

Add a respective label(s) to PR if you have permissions

  • bug fix - change that fixes an issue
  • new feature - change that adds functionality
  • tests - change in tests
  • infrastructure - change in infrastructure and CI
  • documentation - documentation update

Tests

  • added - required for new features and some bug fixes
  • not needed

Documentation

  • updated in # - add PR number
  • needs to be updated
  • not needed

Breaks backward compatibility

  • Yes
  • No
  • Unknown

@petterreinholdtsen petterreinholdtsen force-pushed the test-fork-no-endless-loop branch 3 times, most recently from 09026ab to f1a2853 Compare April 3, 2026 07:02
ASSERT(0, "sigwait failed");
} else
break;
sleep(1);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, with limited number of iterations the test is more reliable.

But why do we need sleep() call?

@petterreinholdtsen

petterreinholdtsen commented Apr 8, 2026 via email

Copy link
Copy Markdown
Contributor Author

@Alexandr-Konovalov

Copy link
Copy Markdown
Contributor

I added the sleep call to make sure the loop was not a tight loop blocking other processes on single threaded and slow machines to run and wrap up their stuff before the five iterations were done.

According to my understanding of the test, a child process is already terminated at this point, and we are just reading pending signals from the kernel. Is it not true?

@petterreinholdtsen

petterreinholdtsen commented Apr 11, 2026 via email

Copy link
Copy Markdown
Contributor Author

@Alexandr-Konovalov

Copy link
Copy Markdown
Contributor

[Alexandr-Konovalov]
According to my understanding of the test, a child process is already terminated at this point, and we are just reading pending signals from the kernel. Is it not true?
It sure can be. Or an alarm might have triggered. Hard to know how different systems fail, and I decided for what I believe is a safer approach. If you believe it is better to drop the sleep(), I am not going to argue. I do not see the advantage of taking the chance, though. -- Happy hacking Petter Reinholdtsen

Without any doubt, busy waiting is evil, especially in 1-CPU configuration. But is there a busy waiting that we may want to improve by sleep() adding? Wrt alarm, the alarm can be triggered already and here we want only collect it status, not wait for something.

Did you see improvements in passrate after adding sleep(1) or is this just in case?

@petterreinholdtsen

petterreinholdtsen commented Apr 14, 2026 via email

Copy link
Copy Markdown
Contributor Author

@Alexandr-Konovalov

Copy link
Copy Markdown
Contributor

[Alexandr-Konovalov]
Did you see improvements in passrate after adding sleep(1) or is this just in case?
I did not measure the effect with or without sleep(1). I started with a success rate of around 2/3 using the sleep(), which I added because I knew Hurd do not handle multiple cores or CPUs, so I suspected a busy wait loop would complete before yielding the CPU to any other process in time for the loop to get any useful updates while the loop is running.

So, are you for checking exact effect of sleep(0)? Or is it better to just remove it?

The most important part of the change is to get rid of the endless loop, which tended to be the 1/3 failing cases. :)

Sure, the infinite loop is not reliable, let's get rid of it.

@petterreinholdtsen

petterreinholdtsen commented Apr 15, 2026 via email

Copy link
Copy Markdown
Contributor Author

There is no reason why the loop to clean up any pending signals should
need more than two iterations to complete any SIGALRM and SIGCHLD
remaining.  Break the loop after 5 iterations and abort if it ever
take this long, and sleep for one second every iteration to give even
slow machines time to process the child.

On GNU Hurd, the loop some times got stuck (around 1/3 of runs during
my testing) in a endless stream of SIGCHLD signals.  This code change
ensure the test exits with an error instead of getting stuck forever.
@petterreinholdtsen petterreinholdtsen force-pushed the test-fork-no-endless-loop branch from f1a2853 to cccb28f Compare April 24, 2026 06:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants