Skip to content

Fix sporadic hangs in FG buffering node tests#2102

Open
kboyarinov wants to merge 4 commits into
masterfrom
dev/kboyarinov/fix-test-buffer-node-hang
Open

Fix sporadic hangs in FG buffering node tests#2102
kboyarinov wants to merge 4 commits into
masterfrom
dev/kboyarinov/fix-test-buffer-node-hang

Conversation

@kboyarinov

Copy link
Copy Markdown
Contributor

Description

Adopt workaround from test_queue_node for other buffering node types.

The existing tests are a bit incorrect to assume TBB worker will come and pick up some internal tasks that are spawned, which is not guaranteed by TBB scheduler.
After a huge amount of checks in spin_try_get, enqueue a task that is guaranteed to execute

Fixes #1467

Type of change

Choose one or multiple, leave empty if none of the other choices apply

Add a respective label(s) to PR if you have permissions

  • bug fix - change that fixes an issue
  • new feature - change that adds functionality
  • tests - change in tests
  • infrastructure - change in infrastructure and CI
  • documentation - documentation update

Tests

  • added - required for new features and some bug fixes
  • not needed

Documentation

  • updated in # - add PR number
  • needs to be updated
  • not needed

Breaks backward compatibility

  • Yes
  • No
  • Unknown

Notify the following users

List users with @ to send notifications

Other information

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses sporadic hangs in several flow-graph “buffering node” tests by centralizing and strengthening the spin_try_get helper so tests reliably trigger execution of pending graph tasks instead of assuming a worker will pick them up.

Changes:

  • Move/centralize spin_try_get into test/common/graph_utils.h, adding a wakeup workaround via oneapi::tbb::this_task_arena::enqueue after prolonged spinning.
  • Remove per-test local spin_try_get implementations from buffering-node test sources and rely on the shared helper.
  • Update test file copyright headers.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
test/common/graph_utils.h Adds shared spin_try_get with a task-enqueue workaround to avoid missed wakeups/hangs.
test/tbb/test_buffer_node.cpp Removes local spin_try_get and uses the shared helper.
test/tbb/test_queue_node.cpp Removes local spin_try_get (including the old workaround) and uses the shared helper.
test/tbb/test_priority_queue_node.cpp Removes local spin_try_get and uses the shared helper.
test/tbb/test_sequencer_node.cpp Removes local spin_try_get; needs an include fix to use the shared helper.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/tbb/test_sequencer_node.cpp

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

Comment thread test/common/graph_utils.h
Comment on lines +1029 to +1030
// This workaround assumes the calling thread is in the graph arena and all
// task elements are submitted to the graph before calling spin_try_get.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an assert or something that will protect us at least from being outside of any arena. For example, consider:

Suggested change
// This workaround assumes the calling thread is in the graph arena and all
// task elements are submitted to the graph before calling spin_try_get.
// This workaround assumes the calling thread is in the graph arena and all
// task elements are submitted to the graph before calling spin_try_get.
CHECK(oneapi::tbb::this_task_arena::current_thread_index() !=
oneapi::tbb::this_task_arena::not_initialized, "Expected to be within an arena");

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test test_buffer_node sporadically hangs on x86_64

3 participants