Page MenuHomeVyOS Platform

ipsec: expose per-peer unique setting for site-to-site connections
Open, NormalPublicBUG

Description

I run a VyOS 2026.05.26-1327-rolling router as the WAN edge for my home network, with 12 IPsec/VTI site-to-site tunnels to remote POPs. After a DHCP restart cascade on the WAN interface (separate bug, T8950), all 12 tunnels accumulated duplicate IKE SAs and lost data plane connectivity. 45 IKE SAs showed ESTABLISHED for 12 tunnels (should be 12), zero bytes flowed inbound on any CHILD_SA, and BFD/BGP were down across the entire mesh.

The root cause is that VyOS does not expose the strongSwan connections.<conn>.unique parameter for site-to-site peers. The swanctl default is unique = no, which only replaces existing SAs if the new one carries INITIAL_CONTACT. With dpd_action = restart, charon initiates new IKE_SAs without INITIAL_CONTACT (it considers the old SA still alive), so duplicates accumulate on every DPD timeout. A single trigger event produced 14 parallel IKE SAs for one tunnel in 18 seconds.

How duplicate SAs break the data plane

With multiple CHILD_SAs sharing the same if_id and reqid, the kernel's xfrm subsystem has multiple eligible outbound SPIs for the same selector. Both ends independently pick an outbound SPI that does not match the other's expected inbound state, so ESP packets are silently dropped in both directions. This is not a transient condition. It persists until orphaned SAs are manually flushed.

Evidence from production

14 IKE SAs for vti201 initiated between 10:08:42 and 10:09:00, all by the local initiator:

May 31 10:08:42 vyos-test1 charon[55205]: 01[IKE] <vti201|2549> initiating IKE_SA vti201[2604] to 149.28.79.157
May 31 10:08:43 vyos-test1 charon[55205]: 04[IKE] <vti201|2553> initiating IKE_SA vti201[2609] to 149.28.79.157
May 31 10:08:43 vyos-test1 charon[55205]: 12[IKE] <vti201|2555> initiating IKE_SA vti201[2610] to 149.28.79.157
... (11 more in same 18-second window)

xfrm state for vti201 alone: 28 entries (14 outbound + 14 inbound, all sharing if_id 0xCA and reqid 8; should be 2). Total across 12 tunnels: 90 xfrm SAs. swanctl -l shows 0 bytes inbound on every CHILD_SA. tcpdump -i vti201 shows BFD packets going out with nothing coming in.

The unique gap between remote-access and site-to-site

T7562 (fixed in 1.4.4, PR #4637) added support for disable-uniqreqids by passing unique = never into the swanctl templates. The template plumbing is already there: peer.j2 accepts a uniqreqids variable and emits unique = {{ uniqreqids }}. But this only fires when disable-uniqreqids is set, and it sets the value to never, which disables even INITIAL_CONTACT-based replacement. Site-to-site peers need replace (the opposite direction).

Meanwhile, remote-access connections already have a per-connection unique knob: set vpn ipsec remote-access connection <rw> unique {no,never,keep,replace}. Site-to-site peers have no equivalent.

VyOS Config Pathunique SupportValues
vpn ipsec remote-access connection <rw> uniqueYesno, never, keep, replace
vpn ipsec disable-uniqreqids (T7562, 1.4.4)YesOnly sets never globally
vpn ipsec site-to-site peer <peer> uniqueMissingN/A (proposed in this task)

Reproduction

Any VyOS instance with a site-to-site IPsec/VTI peer and dpd_action = restart (the default for dead-peer-detection action restart):

  1. Trigger a DPD timeout (e.g., block UDP/500+4500 outbound for 30 seconds, then unblock).
  2. Observe swanctl -l showing multiple IKE SAs for the same connection.
  3. After multiple DPD cycles, data plane traffic stops on the affected VTI.

The promisc-toggle reproducer from T8950 also triggers this: sudo ip link set dev eth0 promisc on && sudo ip link set dev eth0 promisc off on a DHCP-configured WAN interface causes a dhclient restart, which triggers DPD timeouts on all tunnels, which triggers the SA accumulation.

Proposed fix

Add set vpn ipsec site-to-site peer <peer> unique {no,never,keep,replace}, mirroring the existing remote-access connection <rw> unique knob. No default (swanctl continues to default to no when unset). Users who need replace opt in explicitly. No default changes, no migration scripts, existing configs are unaffected.

The template plumbing already exists in peer.j2 (from T7562/PR #4637): it accepts a uniqreqids variable and emits unique = {{ uniqreqids }}. It needs a per-peer config node wired to it instead of inheriting only from the global disable-uniqreqids flag.

Background: how unique = no became the site-to-site default

In ipsec.conf, the default was uniqueids = yes (behaves like unique = replace). When strongSwan introduced swanctl.conf, it changed the default to unique = no. VyOS inherited this new default without adding a knob to let site-to-site users set it back. The remote-access CLI got a unique knob, but site-to-site did not.

unique = no is the correct default for remote-access/roadwarrior connections, where multiple clients legitimately share the same EAP identity (e.g., several laptops connecting with the same username). Enforcing uniqueness there would kick existing clients when a new one connects. But site-to-site peers have fixed, distinct IKE identities per tunnel. Duplicate SAs for the same identity pair are always unintentional, and replace is the correct policy. The two connection types need different defaults, which is why the per-peer knob matters.

T2647 (2020) documented the original disable-uniqreqids bug in ipsec.conf generation. T7562 (2025) fixed disable-uniqreqids for swanctl.conf but only supports the never value. Neither task exposed per-peer unique for site-to-site.

I am running a local hotpatch on my production fleet that changes the template default to replace, which resolved the SA accumulation and restored the full tunnel mesh. But a default change is not appropriate for upstream. Exposing the knob so users can opt in is the right approach.

Related tasks

  • T7562: Command 'set vpn ipsec disable-uniqreqids' does nothing (fixed 1.4.4, PR #4637). Added unique = never template support but not per-peer unique = replace.
  • T2647: ipsec disableuniqreqids generate a wrong ipsec.conf (closed, resolved). Original ipsec.conf era bug.
  • T1079: Duplicated IPSec Tunnel (closed). Display/reporting issue with duplicate tunnel output.
  • T4593: Upgrade strongswan to 5.9.8. Changelog notes "Actively initiating duplicate CHILD_SAs within the same IKE_SA is now largely prevented" for dpd_action=restart. This addressed duplicate CHILD_SAs within a single IKE_SA, not duplicate IKE_SAs (which is our bug).
  • T4551: IPsec rekeying collisions bug. Related SA management issue during rekey.
  • T8950: vyos-netlinkd DHCP restart on UP-to-UP re-notifications. The trigger event for this SA accumulation incident.

I searched vyos.dev open tasks for "unique", "uniqreq", "duplicate ipsec", and "site-to-site" before filing. No existing open task covers per-peer unique for site-to-site.

Environment

  • VyOS 2026.05.26-1327-rolling
  • strongSwan 5.9.11-2+vyos0
  • 12 IPsec/VTI site-to-site tunnels with PSK authentication, dpd_action = restart
  • Running as KVM guest on Proxmox VE 8.4

Details

Version
2026.05.26-1327-rolling
Is it a breaking change?
Perfectly compatible
Issue type
Bug (incorrect behavior)