Skip to content

Bail early in listenUDPInPortRange on interface-level errors#896

Merged
asayyah merged 4 commits into
pion:masterfrom
asayyah:fix-listenudp-bail-on-interface-error
Mar 16, 2026
Merged

Bail early in listenUDPInPortRange on interface-level errors#896
asayyah merged 4 commits into
pion:masterfrom
asayyah:fix-listenudp-bail-on-interface-error

Conversation

@asayyah

@asayyah asayyah commented Mar 13, 2026

Copy link
Copy Markdown
Contributor

Summary

  • When a network interface disappears (e.g. Kubernetes pods with host_network: true), listenUDPInPortRange would try every port in the range before giving up, causing high CPU usage
  • Added isInterfaceLevelError() to detect EADDRNOTAVAIL — the error the kernel returns when binding to an IP that is no longer assigned to any local interface
  • The loop now bails immediately on EADDRNOTAVAIL instead of exhausting the entire port range
  • Cross-platform support: handles both POSIX EADDRNOTAVAIL and Windows WSAEADDRNOTAVAIL (10049) via platform-specific files

Fixes #779

Test plan

  • TestListenUDPInPortRange_BailsOnEADDRNOTAVAIL — mock test verifying early bail on EADDRNOTAVAIL wrapped in real kernel error chain (OpError → SyscallError → Errno)
  • TestListenUDPInPortRange_ContinuesOnPortBusyError — regression guard: EADDRINUSE still tries all ports
  • TestListenUDPInPortRange_RealEADDRNOTAVAIL — integration test using actual kernel bind() against TEST-NET-1 (192.0.2.1), skipped on WASM
  • Existing DefaultsPortMinTo1024 and DefaultsPortMaxToFFFF still pass
  • Builds on all platforms: Linux, macOS, Windows, WASM
  • golangci-lint passes with no new issues

@asayyah asayyah marked this pull request as draft March 13, 2026 23:34
@asayyah asayyah force-pushed the fix-listenudp-bail-on-interface-error branch from 671d794 to a79c67d Compare March 13, 2026 23:39
@codecov

codecov Bot commented Mar 13, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.49%. Comparing base (8f6a882) to head (9e5af95).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #896      +/-   ##
==========================================
+ Coverage   88.41%   88.49%   +0.08%     
==========================================
  Files          44       45       +1     
  Lines        5626     5633       +7     
==========================================
+ Hits         4974     4985      +11     
+ Misses        449      445       -4     
  Partials      203      203              
Flag Coverage Δ
go 88.49% <100.00%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

When a network interface disappears (e.g. in Kubernetes with
host_network where pod interfaces come and go), ListenUDP fails
with EADDRNOTAVAIL for every port in the range. Previously the
loop tried all ports before giving up, causing high CPU usage
with large port ranges.

Now listenUDPInPortRange detects EADDRNOTAVAIL and returns
immediately instead of exhausting the entire port range.

Uses platform-specific files to handle the errno difference
between POSIX (syscall.EADDRNOTAVAIL) and Windows
(WSAEADDRNOTAVAIL = 10049).

Fixes pion#779

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@asayyah asayyah force-pushed the fix-listenudp-bail-on-interface-error branch from 71312bf to 268e9ba Compare March 14, 2026 01:28
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@asayyah asayyah marked this pull request as ready for review March 14, 2026 01:43
@asayyah asayyah requested a review from Sean-Der March 14, 2026 01:44

@JoTurk JoTurk left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you so much, i think we can merge this once the internal comment is addressed.

Comment thread net_errno_unix.go Outdated
}

return false
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please move this to internal?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread net_test.go Outdated
// to verify that binding to a non-existent IP bails immediately instead of
// exhausting the port range. This reproduces the exact scenario from #779.
func TestListenUDPInPortRange_RealEADDRNOTAVAIL(t *testing.T) {
// This test requires real kernel bind() semantics.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not against AI generated PRs or Claude, but i would rather not having comments of the kind of // does something for something().

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah good point, fixed

Comment thread internal/netutil/errno_windows.go
Move isInterfaceLevelError to internal/netutil and
remove verbose test comments.
@asayyah asayyah requested a review from JoTurk March 14, 2026 22:25

@JoTurk JoTurk left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you

@asayyah asayyah merged commit 3cabadd into pion:master Mar 16, 2026
18 checks passed
@asayyah asayyah deleted the fix-listenudp-bail-on-interface-error branch March 16, 2026 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

High CPU usage when connecting

2 participants