I run a VyOS 2026.05.26-1327-rolling router as the WAN edge for my home network, virtualized on a proxmox cluster with set interfaces ethernet eth2 address dhcp and set interfaces ethernet eth2 address dhcpv6. After a routine KVM live migration of the router VM, the house lost internet for approximately 2 hours. I traced the outage to vyos-netlinkd restarting dhclient@eth2.service and dhcp6c@eth2.service 10 times in 15 seconds.
Live migration does not flap the guest link. But the kernel's post-migration NOTIFY PEERS gratuitous-ARP announcement is an RTM_NEWLINK message that carries IFLA_OPERSTATE=UP. vyos-netlinkd sees operstate=UP and restarts DHCP, even though the interface was already UP. I tested single-queue and multiqueue VMs on the same cluster: all generate the same NOTIFY PEERS events. Only VyOS reacts to them because only VyOS runs vyos-netlinkd.
_handle_dhcp_events() does not track per-interface previous operstate. Every UP event triggers systemctl restart dhclient@eth2.service, which sends DHCPRELEASE, flushes the address, deletes the default route via vtysh, and does a fresh DHCPDISCOVER. This also creates a feedback loop: dhclient-script-vyos runs ip link set dev eth2 up during PREINIT, emitting another RTM_NEWLINK(UP), which triggers another restart. One seed event becomes 10+ restarts in 15 seconds.
Root cause
In src/services/vyos-netlinkd, _handle_dhcp_events() handles operstate == 'UP' unconditionally:
elif operstate == 'UP': v6_restart = False interface_path = Section.get_config_path(ifname, delimiter='.') config_dict = op_mode_config_dict( ['interfaces'], key_mangling=('-', '_'), get_first_key=True ) if tmp := dict_search(f'{interface_path}.address', config_dict): if 'dhcp' in tmp: cmd(f'systemctl restart {systemdV4_service}')
There is no check for whether the interface was previously DOWN. The daemon processes every RTM_NEWLINK with operstate='UP' on a DHCP-configured interface as a reason to restart, including UP-to-UP re-notifications from normal kernel events.
The feedback loop is between two VyOS components:
- vyos-netlinkd restarts dhclient@eth2.service
- dhclient-script-vyos runs ip link set dev eth2 up during PREINIT (standard ISC dhclient behavior)
- The ip link set up emits a new RTM_NEWLINK(UP) to the netlink socket
- vyos-netlinkd receives it and restarts dhclient again
- Loop repeats until timing breaks the cycle
Reproduction
Any event that generates RTM_NEWLINK with IFLA_OPERSTATE=UP on a DHCP-configured interface will trigger this. The easiest to reproduce:
KVM live migration (tested on Proxmox VE 8.4, QEMU 11.0.0): live-migrate a VyOS VM that has a DHCP-configured interface. The standard post-migration NOTIFY PEERS gratuitous-ARP event carries IFLA_OPERSTATE=UP in the netlink message. vyos-netlinkd restarts dhclient, the restart's PREINIT emits another UP, and the loop runs 10+ cycles.
I confirmed this affects all VMs, not just multiqueue. A single-queue virtio-net VM also generates NOTIFY PEERS with operstate=UP during live migration. The difference is that only VyOS runs vyos-netlinkd to react to these events.
I also observed a second trigger class: a transient promiscuous-mode toggle on the WAN interface (from a tc ingress qdisc with mirred redirect to ifb0). The promisc toggle generated a single RTM_NEWLINK(UP) that seeded the same feedback loop. The initial trigger for the promisc toggle is still under investigation.
Confirmed from journald
08:15 event (ProxLB live migration, 10 restarts in 15 seconds):
08:15:21 vyos-netlinkd: RTM_NEWLINK -> eth2, state=UP (migration seed) 08:15:21 vyos-netlinkd: Restarting dhclient@eth2.service... 08:15:24 vyos-netlinkd: RTM_NEWLINK -> eth2, state=UP (from dhclient-script PREINIT) 08:15:24 vyos-netlinkd: Restarting dhclient@eth2.service... ... repeats 10x until 08:15:36 ... 08:16:03 dhclient: bound to <WAN IP>
10:08 event (spontaneous promisc toggle, 4 restarts):
10:08:12 kernel: virtio_net virtio3 eth2: entered promiscuous mode 10:08:12 vyos-netlinkd: RTM_NEWLINK -> eth2, state=UP 10:08:12 vyos-netlinkd: Restarting dhclient@eth2.service... 10:08:16 vyos-netlinkd: RTM_NEWLINK -> eth2, state=UP (PREINIT feedback) 10:08:16 vyos-netlinkd: Restarting dhclient@eth2.service... 10:09:57 vyos-netlinkd: RTM_NEWLINK -> eth2, state=UP 10:09:57 vyos-netlinkd: Restarting dhclient@eth2.service... 10:10:00 vyos-netlinkd: RTM_NEWLINK -> eth2, state=UP 10:10:00 vyos-netlinkd: Restarting dhclient@eth2.service...
After hotpatching vyos-netlinkd with the state tracker described below, a controlled live migration produced zero DHCP restarts:
12:31:30 vyos-netlinkd: DHCP event: eth2 operstate=UP prev=UP 12:31:30 vyos-netlinkd: Suppressing DHCP restart for eth2: already UP ... 20+ suppressed events across eth0/eth1/eth2, zero restarts ...
Fix
Track per-interface previous operstate. Only restart DHCP on DOWN-to-UP transitions (or first boot where previous state is unknown), not on UP-to-UP re-notifications:
_iface_prev_state: dict[str, str] = {} def _handle_dhcp_events(operstate: Optional[str], ifname: str) -> None: systemdV4_service = f'dhclient@{ifname}.service' systemdV6_service = f'dhcp6c@{ifname}.service' if operstate not in ['UP', 'DOWN']: return None prev = _iface_prev_state.get(ifname) _iface_prev_state[ifname] = operstate if operstate == 'UP' and prev == 'UP': syslog.syslog(syslog.LOG_NOTICE, f'Suppressing DHCP restart for {ifname}: already UP') return None if operstate == 'DOWN': # ... existing DOWN handler unchanged ... elif operstate == 'UP': # First UP after DOWN (or boot where prev=None) -- restart as before # ... existing UP handler unchanged ...
Edge cases:
- First boot (prev=None, operstate=UP): restarts DHCP. Correct, this is the initial UP.
- DOWN-to-UP (prev='DOWN', operstate=UP): restarts DHCP. Correct, this is a real link recovery.
- UP-to-UP (prev='UP', operstate=UP): suppressed. This is the fix.
- Service restart: dict resets to empty, next UP triggers DHCP restart. Safe.
I have been running this hotpatch on my production WAN router since 2026-05-31. A controlled live migration immediately after deployment produced zero DHCP restarts (all UP-to-UP events suppressed) with no impact on normal DHCP operation.
Related
- T3852: duplicate dhclient processes on link replug (same root cause area, closed as resolved, pre-dates vyos-netlinkd)
- T5686: loss of connectivity on DHCP interfaces after link flap (same symptom class)
- T8486: vyos-netlinkd high CPU (different issue, same daemon, recently fixed)
- T8781: vyos-netlinkd high CPU with route updates (different issue, same daemon)
- T3876/T5476: design and implementation of vyos-netlinkd replacing netplug
Environment
- VyOS 2026.05.26-1327-rolling
- Running as KVM guest on Proxmox VE 8.4 (QEMU 11.0.0, kernel 6.8)
- WAN interface configured with address dhcp and address dhcpv6 with prefix delegation
- ISC dhclient 4.4.3-P1 (the version shipped with this rolling build)
- Python 3.12 (pyroute2 for netlink)