Skip to content

Conversation

@sleepybishop
Copy link
Contributor

@sleepybishop sleepybishop commented Jan 30, 2025

This pr add busypolling support to h2o. It is linux only.

For notes on busy polling and its internals, please see the following:

The current implementation make several assumptions that may need further thought

  • It is assumed that num-threads will be equivalent to the number of cpus configured in the busypoll map.
  • all interfaces will be declared in busypoll map using mode to specify busypoll behavior. Eg. non-busypolling interfaces will be included but marked with mode: OFF in the options subsection.
  • loopback interface will be assigned to 1 cpu

…e being passed into open_inet_listener and adjust the logic in create_per_thread_nic_map_listeners to call finish_open_listener for all listeners
@sleepybishop sleepybishop force-pushed the jcalderon/busypoller branch 2 times, most recently from 3feaff9 to 4a6a7ab Compare January 31, 2025 19:46
…ate iface being passed into open_inet_listener and adjust the logic in create_per_thread_nic_map_listeners to call finish_open_listener for all listeners"

This reverts commit daf55c7.
@sleepybishop sleepybishop force-pushed the jcalderon/busypoller branch 2 times, most recently from febb4fe to e55be3c Compare February 5, 2025 00:25
@sleepybishop sleepybishop force-pushed the jcalderon/busypoller branch 2 times, most recently from 70c5bd5 to 426b9d8 Compare February 13, 2025 00:54
Copy link
Member

@kazuho kazuho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the changes! I like how the changes are becoming isolated.

Copy link
Member

@kazuho kazuho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all the changes. I really like what we have now, all my comments below are nitpicks.

I might try myself to see if we can further reduce the changes to src/main.c, but I am starting to guess I'd fail, as it already looks quite polished up.

@jdamato-fsly
Copy link
Contributor

jdamato-fsly commented May 1, 2025

there's a bug in the loopback setup with this branch, issuing two requests to loopback causes it to fail:

thread 0 NAPI id: 0, iface: lo, listener fd: 4
found matching iface in conf map: lo
thread idx 0, has taken cpu: 0, for NIC: lo, via napi: 0 via listener 4

thread 0 NAPI id: 0, iface: lo, listener fd: 4
found matching iface in conf map: lo
no CPU found for thread with NAPI from NIC: lo:0
busy-poll-cpu-map configuration or queue setup is buggy -- exiting h2o now.

which shouldn't happen -- the right listener fd is being awoken, so either:

  • multiple threads have the FD for that listener in their epoll loop (they shouldn't)
  • or the code doesn't claim a CPU / thread after the first accept (seems unlikely, since the second request can find no available CPU, meaning the first request triggered the listener to take a CPU and pin itself to that CPU)

possibly worth adding a test which makes multiple requests to both loopback and non-loopback interfaces?

@sleepybishop
Copy link
Contributor Author

possibly worth adding a test which makes multiple requests to both loopback and non-loopback interfaces?

in t/90busypoll.tright before sending large requests to the netdevsim interfaces theres a block that sends requests to loopback

    for (1..5) {
        my $resp = `ip netns exec nssv curl -ksi http://127.0.0.1:$port`;
    }

the same logic runs after the sends to the netdevsim interfaces as well, im not sure why it's not triggering the behavior you are seeing.

@jdamato-fsly
Copy link
Contributor

the same logic runs after the sends to the netdevsim interfaces as well, im not sure why it's not triggering the behavior you are seeing.

Because the test is incorrect.

If you change the test configuration according to the following diff:

diff --git a/t/90busypoll.t b/t/90busypoll.t
index 87bad8346..3388450ef 100644
--- a/t/90busypoll.t
+++ b/t/90busypoll.t
@@ -185,7 +185,8 @@ busy-poll-map:
         suspend-timeout: $suspend_timeout
         mode: $mode
     - ifindex: 1
-      cpus: [$total_threads]
+      cpus:
+        - 0
       options:
         mode: OFF
 EOT

the problem reproduces:

  3.544553   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
  3.544583                                  Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0thread 4 NAPI id: 0, iface: lo, listener fd: 11
  3.544789 found matching iface in conf map: lo
  3.544855 thread idx 4, has taken cpu: 0, for NIC: lo, via napi: 0 via listener 11
100   177  100   177    0     0   331k      0 --:--:-- --:--:-- --:--:--  172k
  3.551733   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
  3.551763                                  Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0thread 4 NAPI id: 0, iface: lo, listener fd: 11
  3.551931 found matching iface in conf map: lo
  3.552273 no CPU found for thread with NAPI from NIC: lo:0
  3.552277 busy-poll-cpu-map configuration or queue setup is buggy -- exiting h2o now.
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  3.553185 curl: (56) Recv failure: Connection reset by peer
  3.559614   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
  3.559631                                  Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  3.559809 curl: (7) Failed to connect to localhost port 32773 after 0 ms: Couldn't connect to server
  3.565928   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
  3.565981                                  Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  3.566234 curl: (7) Failed to connect to localhost port 32773 after 0 ms: Couldn't connect to server
  3.572766   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
  3.572789                                  Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  3.572957 curl: (7) Failed to connect to localhost port 32773 after 0 ms: Couldn't connect to server

@sleepybishop
Copy link
Contributor Author

If you change the test configuration according to the following diff:

This originated with the change to add the serving cpu index to as a logging directive, the cpu_claimed thread local changed from being a boolean to holding the index of the pinned cpu, that introduced an invalid assumption that the cpu indexes started at one. This has been addressed in 3f3a09b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants