Skip to content

Problem: HWM in ZMQ_DGRAM socket does not respect multipart message #3268

@gabm

Description

@gabm

Please use this template for reporting suspected bugs or requests for help.

Issue description

When sending a lot of messages over a ZMQ_DGRAM socket, I hit an assertion in: Assertion failed: check () (/home/gabm/zeromq/src/msg.cpp:347).

Environment

  • libzmq version (commit hash if unreleased):
  • OS:

Minimal test code / Steps to reproduce the issue

    auto ctx = zmq_ctx_new();
    auto sender = zmq_socket(ctx, ZMQ_DGRAM);

    int hwm = 3;
    zmq_setsockopt(sender, ZMQ_SNDHWM, &hwm, sizeof(hwm));
    zmq_bind(sender, "udp://127.0.0.1:5556");

    std::string remote = "127.0.0.1:5555";
    for (int i = 0; i < 100000; i++) {
        zmq_send(sender, remote.c_str(), remote.length(), ZMQ_SNDMORE);

        zmq_msg_t msg;
        zmq_msg_init_size(&msg, 8000);
        zmq_msg_send(&msg, sender, 0);
        zmq_msg_close(&msg);
    }

    zmq_close(sender);
    zmq_ctx_destroy(ctx);

What's the actual result? (include assertion message & call stack if applicable)

The callstack is:

1  raise                                                         0x7ffff7705d7f 
2  abort                                                         0x7ffff76f0672 
3  zmq::zmq_abort                            err.cpp         88  0x7ffff7d2170e 
4  zmq::msg_t::size                          msg.cpp         347 0x7ffff7d2ce6e 
5  zmq::udp_engine_t::out_event              udp_engine.cpp  380 0x7ffff7d80db6 
6  zmq::epoll_t::loop                        epoll.cpp       202 0x7ffff7d2085c 
7  zmq::worker_poller_base_t::worker_routine poller_base.cpp 139 0x7ffff7d3d50d 
8  thread_routine                            thread.cpp      182 0x7ffff7d6736e 
9  start_thread                                                  0x7ffff7bd3a9d 
10 clone                                                         0x7ffff77c9a43 

So it happens in udp_engine.cpp at line 380. The rc value of the second pull_msg call is -1.

What's the expected result?

that it works 😄

Cause

Digging a bit deeper I found out that it happens when hitting the HWM. I started my program a very small HWM and it happens immediatly. The DGRAM socket interface requires two messages to be sent: the destination address and the body message. I think that one of these two gets discarded when hitting the HWM, the other survives. Then when processing the messages in the udp_engine, we cannot assume to get "both messages", because one might be dropped due to the HWM.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions