-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Description
What happened?
Summary:
Edge brokers lose topic subscriptions after remote cluster node failures; rules stop working until manual restart or connector disable/enable.
Context:
We have multiple edge EMQX single-node brokers (v6.0.1) connected to a remote continental EMQX 3-node cluster deployed in AKS using the EMQX Operator (v5.8.2).
The edge brokers have a rule that uses an MQTT bridge source as data input (emqx_continental_lc_in).
The source have the following configuration:
{
"error": "",
"name": "emqx_continental_lc_in",
"status": "connected",
"type": "mqtt",
"rules": [
"from_emqx_continental_lc_in"
],
"node_status": [
{
"node": "emqx@55653",
"status": "connected",
"status_reason": ""
}
],
"status_reason": "",
"connector": "continental_mqtt_connector",
"created_at": 1764695446767,
"description": "Connector to gather centralized messages from eMQX",
"enable": true,
"last_modified_at": 1764758241257,
"parameters": {
"no_local": false,
"qos": 1,
"topic": "$queue/lc/55653/in/#"
},
"resource_opts": {
"health_check_interval": "15s",
"health_check_interval_jitter": "0ms",
"health_check_timeout": "60s"
}
}The connector have the following configuration
{
"name": "continental_mqtt_connector",
"status": "connected",
"type": "mqtt",
"sources": [
"emqx_continental_lc_in",
"source_continental_emqx_location"
],
"node_status": [
{
"node": "emqx@55653",
"status": "connected"
}
],
"actions": [
"republish_to_continental"
],
"bridge_mode": false,
"clean_start": true,
"clientid_prefix": "edge_55653_",
"connect_timeout": "10s",
"description": "Connector to external EMQX",
"enable": true,
"keepalive": "60s",
"max_inflight": 32,
"password": "******",
"pool_size": 4,
"proto_ver": "v5",
"resource_opts": {
"health_check_interval": "15s",
"health_check_timeout": "10s",
"start_after_created": true,
"start_timeout": "6s"
},
"retry_interval": "15s",
"server": "100.64.239.251:1883",
"ssl": {
"ciphers": [],
"depth": 10,
"enable": false,
"hibernate_after": "5s",
"log_level": "notice",
"middlebox_comp_mode": true,
"reuse_sessions": true,
"secure_renegotiate": true,
"verify": "verify_none",
"versions": [
"tlsv1.3",
"tlsv1.2"
]
},
"static_clientids": [
{
"ids": [
{
"clientid": "edge_55653_1",
"password": "******",
"username": "<user>"
},
{
"clientid": "edge_55653_2",
"password": "******",
"username": "<user>"
},
{
"clientid": "edge_55653_3",
"password": "******",
"username": "<user>"
},
{
"clientid": "edge_55653_4",
"password": "******",
"username": "<user>"
}
],
"node": "emqx@55653"
}
],
"username": "<user>"
}We also preformed tests with the following connector config variations, getting the same outcome:
- clean_start = false
- Not using static clients
- Configuring the 2 connectors, one dedicated for rule sources (which requires subscriptions) and a different one for rule actions (publish)
Issue:
Sometimes, randomly, edge nodes stop receiving data from the rule source.
Client logs show no errors.
Connector appears connected and data still flows from edge → cluster, but subscriptions are lost.
Issue is mitigated by manual restart of the edge broker or disable/enable connector.
We updated edge brokers to v6.0.1 as per #15538
What did you expect to happen?
Rules should continue working after reconnections.
Edge brokers must keep receiving data from topics specified in the rule’s Data Input configured to use a bridge with the remote cluster.
How can we reproduce it (as minimally and precisely as possible)?
- Edge broker rule works after disable/enable (logs TS 2025-12-03T17:45:03.708536+00:00).
Initial connections:
edge_55653_1→ emqx-0edge_55653_2→ emqx-1edge_55653_3→ emqx-0edge_55653_4→ emqx-0
- Drop remote node emqx-0 (logs TS 2025-12-03T17:48:22.816088+00:00).
- Static clients dropped (socket_closed_when_connected).
- Edge logs show 8 drops (due to other connectors).
- After reconnection:
edge_55653_1→ emqx-2 (subscriptions LOST)edge_55653_2→ emqx-1 (OK)edge_55653_3→ emqx-1 (subscriptions LOST)edge_55653_4→ emqx-1 (subscriptions LOST)
4.Continue deleting nodes emqx-1 and emqx-2 → all connections lose subscriptions → rule stops working.
Anything else we need to know?
- Logs confirm pattern: after reconnections, subscriptions are not recreated for some clients.
- Connector remains connected and operational for publishing, but rule input fails.
EMQX version
Edge Broker: EMQX v6.0.1 (running on Kubernetes single node)
Remote Broker: EMQX v5.8.2 (running on Kubernetes 3-nodes cluster community version)
OS version
Details
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here