Skip to content

Subscriptions lost after connector reconnection in EMQX bridge rules #16386

@oscarsnav-ext-inditex

Description

@oscarsnav-ext-inditex

What happened?

Summary:

Edge brokers lose topic subscriptions after remote cluster node failures; rules stop working until manual restart or connector disable/enable.

Context:

We have multiple edge EMQX single-node brokers (v6.0.1) connected to a remote continental EMQX 3-node cluster deployed in AKS using the EMQX Operator (v5.8.2).
The edge brokers have a rule that uses an MQTT bridge source as data input (emqx_continental_lc_in).

The source have the following configuration:

{
    "error": "",
    "name": "emqx_continental_lc_in",
    "status": "connected",
    "type": "mqtt",
    "rules": [
        "from_emqx_continental_lc_in"
    ],
    "node_status": [
        {
            "node": "emqx@55653",
            "status": "connected",
            "status_reason": ""
        }
    ],
    "status_reason": "",
    "connector": "continental_mqtt_connector",
    "created_at": 1764695446767,
    "description": "Connector to gather centralized messages from eMQX",
    "enable": true,
    "last_modified_at": 1764758241257,
    "parameters": {
        "no_local": false,
        "qos": 1,
        "topic": "$queue/lc/55653/in/#"
    },
    "resource_opts": {
        "health_check_interval": "15s",
        "health_check_interval_jitter": "0ms",
        "health_check_timeout": "60s"
    }
}

The connector have the following configuration

{
    "name": "continental_mqtt_connector",
    "status": "connected",
    "type": "mqtt",
    "sources": [
        "emqx_continental_lc_in",
        "source_continental_emqx_location"
    ],
    "node_status": [
        {
            "node": "emqx@55653",
            "status": "connected"
        }
    ],
    "actions": [
        "republish_to_continental"
    ],
    "bridge_mode": false,
    "clean_start": true,
    "clientid_prefix": "edge_55653_",
    "connect_timeout": "10s",
    "description": "Connector to external EMQX",
    "enable": true,
    "keepalive": "60s",
    "max_inflight": 32,
    "password": "******",
    "pool_size": 4,
    "proto_ver": "v5",
    "resource_opts": {
        "health_check_interval": "15s",
        "health_check_timeout": "10s",
        "start_after_created": true,
        "start_timeout": "6s"
    },
    "retry_interval": "15s",
    "server": "100.64.239.251:1883",
    "ssl": {
        "ciphers": [],
        "depth": 10,
        "enable": false,
        "hibernate_after": "5s",
        "log_level": "notice",
        "middlebox_comp_mode": true,
        "reuse_sessions": true,
        "secure_renegotiate": true,
        "verify": "verify_none",
        "versions": [
            "tlsv1.3",
            "tlsv1.2"
        ]
    },
    "static_clientids": [
        {
            "ids": [
                {
                    "clientid": "edge_55653_1",
                    "password": "******",
                    "username": "<user>"
                },
                {
                    "clientid": "edge_55653_2",
                    "password": "******",
                    "username": "<user>"
                },
                {
                    "clientid": "edge_55653_3",
                    "password": "******",
                    "username": "<user>"
                },
                {
                    "clientid": "edge_55653_4",
                    "password": "******",
                    "username": "<user>"
                }
            ],
            "node": "emqx@55653"
        }
    ],
    "username": "<user>"
}

We also preformed tests with the following connector config variations, getting the same outcome:

  1. clean_start = false
  2. Not using static clients
  3. Configuring the 2 connectors, one dedicated for rule sources (which requires subscriptions) and a different one for rule actions (publish)

Issue:

Sometimes, randomly, edge nodes stop receiving data from the rule source.
Client logs show no errors.
Connector appears connected and data still flows from edge → cluster, but subscriptions are lost.
Issue is mitigated by manual restart of the edge broker or disable/enable connector.
We updated edge brokers to v6.0.1 as per #15538

What did you expect to happen?

Rules should continue working after reconnections.
Edge brokers must keep receiving data from topics specified in the rule’s Data Input configured to use a bridge with the remote cluster.

How can we reproduce it (as minimally and precisely as possible)?

  1. Edge broker rule works after disable/enable (logs TS 2025-12-03T17:45:03.708536+00:00).
    Initial connections:
  • edge_55653_1 → emqx-0
  • edge_55653_2 → emqx-1
  • edge_55653_3 → emqx-0
  • edge_55653_4 → emqx-0
  1. Drop remote node emqx-0 (logs TS 2025-12-03T17:48:22.816088+00:00).
  • Static clients dropped (socket_closed_when_connected).
  • Edge logs show 8 drops (due to other connectors).
  1. After reconnection:
  • edge_55653_1 → emqx-2 (subscriptions LOST)
  • edge_55653_2 → emqx-1 (OK)
  • edge_55653_3 → emqx-1 (subscriptions LOST)
  • edge_55653_4 → emqx-1 (subscriptions LOST)

4.Continue deleting nodes emqx-1 and emqx-2 → all connections lose subscriptions → rule stops working.

Image Image Image Image

Anything else we need to know?

  • Logs confirm pattern: after reconnections, subscriptions are not recreated for some clients.
  • Connector remains connected and operational for publishing, but rule input fails.

EMQX version

Edge Broker: EMQX v6.0.1 (running on Kubernetes single node)
Remote Broker: EMQX v5.8.2 (running on Kubernetes 3-nodes cluster community version)

OS version

Details
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

Log files

Details

edge_55653_ClientLogs.txt
edge_55653_4_REMOVE_NODES2.zip
edge_55653_3_REMOVE_NODES2.zip
edge_55653_2_REMOVE_NODES2.zip
edge_55653_1_REMOVE_NODES2.zip
edge_continental_mqtt_connector.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions