Watch dhcpd_leases changes#23072
Draft
nirs wants to merge 6 commits into
Draft
Conversation
Contributor
|
Skipping CI for Draft Pull Request. |
Contributor
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: nirs The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
11d9595 to
7d5d4ff
Compare
Move DHCP lease types and functions from pkg/drivers/common into a dedicated dhcp subpackage, preparing for deduplication of the setupIP polling loop across vfkit, krunkit, qemu, and hyperkit drivers (kubernetes#21093) Pure code movement with renames to fit the new package context: - DHCPEntry → dhcp.Entry - GetIPAddressByMACAddress → dhcp.IPAddressForMAC - parseDHCPdLeasesFile → parseLeases - getIPAddressFromFile → ipAddressFromFile
Replace the duplicated DHCP lease polling loops in vfkit, krunkit, and qemu with a single dhcp.WaitForLease call. Changes from the original polling loops: - Switch from retry count to a timeout parameter, so the waiting strategy can be changed later (e.g. to fsnotify) without affecting callers. - Include the NestedVM timeout multiplier inside WaitForLease so callers don't need to duplicate it. - Keep firewall.UnblockBootpd in each driver's setupIP since this code is wrong and we don't want to move it into the new package. How drivers are affected: - vfkit, qemu: no behavior change — same polling logic moved to dhcp.WaitForLease (2s interval, 2min timeout, 3x for nested VMs). - krunkit: now uses the same NestedVM timeout multiplier that vfkit and qemu had. Previously it was missing. - hyperkit: no change — driver is deprecated and untested.
- Remove per-entry debug logging from ipAddressFromFile (200+ lines per
poll on a typical Mac)
- Remove redundant "Found N entries" and "Found match" logs from
ipAddressFromFile
- Move all logging to WaitForLease: one debug line per attempt with
the MAC, path, and attempt number
- Add Infof at start ("Waiting for DHCP lease...") and on success
("Found DHCP lease... in N seconds"), matching WaitForSSHAccess style
Before (truncated):
I0601 18:39:17.078574 63890 main.go:144] libmachine: Attempt 0
I0601 18:39:17.078618 63890 main.go:144] libmachine: Searching for ca:69:e8:b4:7e:92 in /var/db/dhcpd_leases ...
I0601 18:39:17.080875 63890 main.go:144] libmachine: unable to parse hw_address in dhcp leases file: ",f1:f5:dd:7f:0:2:0:0:ab:11:ad:75:7f:d2:b5:8d:7a:74": invalid MAC address: ",f1:f5:dd:7f:0:2:0:0:ab:11:ad:75:7f:d2:b5:8d:7a:74"
I0601 18:39:17.082822 63890 main.go:144] libmachine: Found 212 entries in /var/db/dhcpd_leases!
I0601 18:39:17.082832 63890 main.go:144] libmachine: dhcp entry: {Name:podman-krunkit IPAddress:192.168.64.2 HWAddress:7a:65:90:0f:bb:1c ID:1,7a:65:90:f:bb:1c Lease:0x6a1da917}
I0601 18:39:17.082835 63890 main.go:144] libmachine: dhcp entry: {Name:minikube IPAddress:192.168.65.31 HWAddress:52:b7:40:f6:a0:76 ID:1,52:b7:40:f6:a0:76 Lease:0x6a1d93cd}
I0601 18:39:17.082839 63890 main.go:144] libmachine: dhcp entry: {Name:minikube IPAddress:192.168.65.30 HWAddress:b2:77:60:58:94:56 ID:1,b2:77:60:58:94:56 Lease:0x6a1cd015}
I0601 18:39:17.082845 63890 main.go:144] libmachine: dhcp entry: {Name:minikube IPAddress:192.168.65.28 HWAddress:de:00:f0:01:c5:b9 ID:1,de:0:f0:1:c5:b9 Lease:0x6a1ccf3e}
I0601 18:39:17.082848 63890 main.go:144] libmachine: dhcp entry: {Name:minikube IPAddress:192.168.65.27 HWAddress:5a:6b:43:02:61:96 ID:1,5a:6b:43:2:61:96 Lease:0x6a1ca8e4}
I0601 18:39:17.082851 63890 main.go:144] libmachine: dhcp entry: {Name:minikube IPAddress:192.168.65.26 HWAddress:a2:af:e7:af:d3:cd ID:1,a2:af:e7:af:d3:cd Lease:0x6a1ca2e8}
I0601 18:39:17.082854 63890 main.go:144] libmachine: dhcp entry: {Name:minikube IPAddress:192.168.65.25 HWAddress:3e:0c:9c:58:67:a5 ID:1,3e:c:9c:58:67:a5 Lease:0x6a1ca0aa}
I0601 18:39:17.082857 63890 main.go:144] libmachine: dhcp entry: {Name:minikube IPAddress:192.168.65.24 HWAddress:1e:07:57:0d:cf:60 ID:1,1e:7:57:d:cf:60 Lease:0x6a1a5f61}
I0601 18:39:17.082860 63890 main.go:144] libmachine: dhcp entry: {Name:minikube IPAddress:192.168.65.23 HWAddress:66:b5:45:72:d3:5c ID:1,66:b5:45:72:d3:5c Lease:0x6a1a59f1}
I0601 18:39:17.082863 63890 main.go:144] libmachine: dhcp entry: {Name:minikube IPAddress:192.168.65.22 HWAddress:26:ff:06:8b:a7:b2 ID:1,26:ff:6:8b:a7:b2 Lease:0x6a19a8ab}
I0601 18:39:17.082868 63890 main.go:144] libmachine: dhcp entry: {Name:minikube IPAddress:192.168.65.21 HWAddress:c6:ec:f2:7e:b6:b4 ID:1,c6:ec:f2:7e:b6:b4 Lease:0x6a19a881}
[snipped unhelpful 202 dhcp entires ...]
I0601 18:39:19.084157 63890 main.go:144] libmachine: Attempt 1
I0601 18:39:19.084210 63890 main.go:144] libmachine: Searching for ca:69:e8:b4:7e:92 in /var/db/dhcpd_leases ...
I0601 18:39:19.084901 63890 main.go:144] libmachine: unable to parse hw_address in dhcp leases file: ",f1:f5:dd:7f:0:2:0:0:ab:11:ad:75:7f:d2:b5:8d:7a:74": invalid MAC address: ",f1:f5:dd:7f:0:2:0:0:ab:11:ad:75:7f:d2:b5:8d:7a:74"
I0601 18:39:19.085992 63890 main.go:144] libmachine: Found 213 entries in /var/db/dhcpd_leases!
I0601 18:39:19.086026 63890 main.go:144] libmachine: dhcp entry: {Name:minikube IPAddress:192.168.65.32 HWAddress:ca:69:e8:b4:7e:92 ID:1,ca:69:e8:b4:7e:92 Lease:0x6a1da9fe}
I0601 18:39:19.086034 63890 main.go:144] libmachine: Found match: ca:69:e8:b4:7e:92
I0601 18:39:19.086064 63890 main.go:144] libmachine: IP: 192.168.65.32
After:
I0601 18:47:33.625633 64305 main.go:144] libmachine: Waiting for DHCP lease for ea:d0:8c:4d:f7:b7 (timeout 2m0s)
I0601 18:47:33.625646 64305 main.go:144] libmachine: Searching for ea:d0:8c:4d:f7:b7 in /var/db/dhcpd_leases (attempt 0) ...
I0601 18:47:33.625819 64305 main.go:144] libmachine: unable to parse hw_address in dhcp leases file: ",f1:f5:dd:7f:0:2:0:0:ab:11:ad:75:7f:d2:b5:8d:7a:74": invalid MAC address: ",f1:f5:dd:7f:0:2:0:0:ab:11:ad:75:7f:d2:b5:8d:7a:74"
I0601 18:47:35.627333 64305 main.go:144] libmachine: Searching for ea:d0:8c:4d:f7:b7 in /var/db/dhcpd_leases (attempt 1) ...
I0601 18:47:35.628264 64305 main.go:144] libmachine: unable to parse hw_address in dhcp leases file: ",f1:f5:dd:7f:0:2:0:0:ab:11:ad:75:7f:d2:b5:8d:7a:74": invalid MAC address: ",f1:f5:dd:7f:0:2:0:0:ab:11:ad:75:7f:d2:b5:8d:7a:74"
I0601 18:47:35.629414 64305 main.go:144] libmachine: Found DHCP lease for ea:d0:8c:4d:f7:b7: 192.168.65.33 in 2.004 seconds
I0601 18:47:35.629452 64305 main.go:144] libmachine: Waiting until SSH server "192.168.65.33:22" is accessible
The macOS /var/db/dhcpd_leases file uses a type prefix in hw_address:
- "1,..." is an Ethernet MAC address (ARP hardware type 1)
- "ff,..." is a DUID (DHCP Unique Identifier)
The old code blindly skipped the first 2 characters (assuming "1,"),
which failed to parse DUID entries and logged a warning on every poll:
unable to parse hw_address in dhcp leases file:
",f1:f5:dd:7f:0:2:0:0:ab:11:ad:75:7f:d2:b5:8d:7a:74":
invalid MAC address: ...
Fix by splitting on "," and handling each type:
- "1": parse as Ethernet MAC address
- "ff": skip silently — these are created by VMs using systemd-networkd
which sends a DUID instead of a MAC. Minikube VMs always use Ethernet
so they never match these entries.
- other: warn about unknown type so we learn about new formats
Add a DUID entry to the test data (between two Ethernet entries) and
a test case verifying that Ethernet entries after a DUID are found.
Logs after this change (no more DUID warnings):
Waiting for DHCP lease for ce:1b:6b:eb:28:a3 (timeout 2m0s)
Searching for ce:1b:6b:eb:28:a3 in /var/db/dhcpd_leases (attempt 0) ...
Searching for ce:1b:6b:eb:28:a3 in /var/db/dhcpd_leases (attempt 1) ...
Found DHCP lease for ce:1b:6b:eb:28:a3: 192.168.65.34 in 2.004 seconds
Previously, parseMAC was called inside ipAddressFromFile on every poll attempt. If the MAC was invalid, the parse error was returned, the loop continued, and WaitForLease would keep retrying for 2 minutes (6 minutes on nested VMs) before finally returning the parse error. Fix by parsing the MAC once upfront in both WaitForLease and IPAddressForMAC. An invalid MAC now fails immediately with a clear error instead of silently retrying for minutes.
Replace DHCP lease polling with fsnotify file watching
Replace the 2-second polling loop in WaitForLease with event-driven
file watching using fsnotify. The leases file is now parsed only when
it actually changes, and the watcher includes a 10-second periodic
fallback in case fsnotify events are missed.
On CI first boot the file may not exist for minutes — the old code
logged every 2 seconds even though os.Open failed immediately.
Watcher
-------
The watcher monitors /var/db/dhcpd_leases via fsnotify. It has no
logging and no polling loop — it just returns events to the caller.
We watch the file directly rather than its parent directory (/var/db)
because macOS SIP protects some entries under /var/db, causing
fsnotify to fail with "operation not permitted".
Atomic writes (write tmp + rename) cause kqueue to lose the watch on
the original inode. We handle this by re-watching on Remove/Rename.
Events:
- fileMissing — file does not exist (returned immediately)
- fileCreated — file appeared, watching started
- fileChanged — fsnotify detected a change
- periodicCheck — 10s fallback in case events are missed
- deadlineExceeded — timeout
WaitForLease
------------
Drives the loop, handles logging, searches the leases file only when
the watcher reports a change.
On fileMissing: logs "waiting until file is created" once, then
"still waiting" every minute. Sleeps 1s before retrying.
On fileCreated/fileChanged/periodicCheck: searches for the MAC.
Flows
-----
First boot, file exists (local):
1. fileCreated → search → no match
2. fileChanged → search → found (dhcpd wrote the lease)
First boot, file missing (CI):
1. fileMissing → log "waiting until file is created"
2. fileMissing × N → silent ("still waiting" every minute)
3. fileCreated → search → found
Second boot:
1. fileCreated → search → found immediately
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Replace polling when waiting for a DHCP lease with fsnotify-based file
watching. Parse the leases file only when it changes, log only on events.
Motivation
The old code polled
/var/db/dhcpd_leasesevery 2 seconds while waiting for alease:
How it works
watcher
The watcher monitors
/var/db/dhcpd_leasesvia fsnotify. It has no logging andno polling loop — it just returns events to the caller.
Each call to
watch()either returns immediately or blocks:fileMissingimmediately. The caller decides how long to wait before retrying.or the deadline timer.
Events:
fileMissing— file does not exist, or was removed during watching (returned immediately, non-blocking)fileCreated— file appeared, watching startedfileChanged— fsnotify detected a change (including atomic writes via rename/remove)periodicCheck— 10s fallback in case fsnotify events are misseddeadlineExceeded— timeoutWaitForLease
WaitForLeasedrives the loop, handles all logging, and searches the leasesfile only when the watcher reports a change.
On
fileMissing: logs "waiting until file is created" once, then a "stillwaiting" heartbeat every minute. Sleeps 1s before retrying.
On
fileCreated/fileChanged/periodicCheck: searches the leases filefor the MAC address.
Flows
First boot, file exists (local)
The leases file exists with entries from other VMs, but our lease isn't there yet.
fileCreated→ search leases file → no matchfileChanged→ search leases file → found (dhcpd wrote the lease)First boot, file missing (CI)
The leases file doesn't exist yet — common on CI runners.
fileMissing→ log "Leases file removed, waiting until … is created"fileMissing× N → silent (log "Still waiting …" every minute)fileCreated→ search leases file → foundSecond boot
The leases file exists and our lease is already in it.
fileCreated→ search leases file → found immediatelyExample logs
First boot (CI) — file missing ~100s
File doesn't exist for 117 seconds — only 2 status lines during the wait
instead of "Searching for …" every 2 seconds.
I0601 22:01:45.849152 38317 main.go:144] libmachine: Nested VM detected, increasing timeout from 2m0s to 6m0s
I0601 22:01:45.849168 38317 main.go:144] libmachine: Waiting for DHCP lease for e6:f6:65:cb:17:82 (timeout 6m0s)
I0601 22:01:45.849344 38317 main.go:144] libmachine: Leases file created, searching for e6:f6:65:cb:17:82 in /var/db/dhcpd_leases
I0601 22:01:45.849425 38317 main.go:144] libmachine: Found DHCP lease for e6:f6:65:cb:17:82: 192.168.65.2 in 0.000 seconds```
Example logs - first start (local)
File exists with ~200 entries but our lease isn't there yet — we search once,
start watching, and find it ~1.3s later on the first
fileChangedinstead ofpolling every 2 seconds.
Example logs - second start (local)
Lease already in the file — found on the initial check, no watcher needed.
Based on #23084