Skip to content

Re-implement broker/bin/v1 with a direct Val-based event serializer #5404

@awelzel

Description

@awelzel

Creating a ticket to track the work and mark it for 9.0 or 9.1.

The BROKER_BIN_V1 serializer in cluster/serializer/broker today delegates the encoding work to the broker library.
This means it first converts the zeek::cluster::Event to a broker::zeek::Event type which also requires converting all Val instances to broker::data via the val_to_data() helper in broker/Data.h.

This presents a dependency on Broker and blocks #2476. We'd prefer to have the serialization code within Zeek directly, which would also allow to apply optimizations on just the Val or ZVal layer.

I've started to implement a "sans" broker version of the serializer that'll ideally be binary compatible. It works directly on the zeek::Val representation stored in zeek::cluster::Event and renders into a byte buffer, bypassing the intermediate broker::data representation.

WIP branch:
https://github.com/zeek/zeek/tree/topic/awelzel/reimplement-broker-bin-v1

It's still in progress, but some initial benchmark indicates this is promising also from a performance/efficiency angle. Basic serialization and deserialization of a Cluster::node_up event with its two string arguments look as if they get ~3x speedup.

$ taskset -c 1,2,3 ./build/src/benchmarks/cluster/event-serializer  
[...]
---------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations
---------------------------------------------------------------------------
BM_Serialize_NodeUp_SansBroker          403 ns          403 ns      1768801
BM_Serialize_NodeUp_Broker             1250 ns         1250 ns       568050
BM_Unserialize_NodeUp_SansBroker        226 ns          226 ns      3083203
BM_Unserialize_NodeUp_Broker            857 ns          857 ns       818552

Some more staring indicates it'd be useful to work on record and vector elements via the embedded ZVal instances directly, but skipping this for now as val_to_data() doesn't do that either.

TODO / Discussions

  • Figure out how to move away from SerializeData() of OpaqueVal. Current idea is to make the contract such that an OpaqueVal that is serializable for cluster transport should render into an intermediate ListVal of atomic types rather than requiring users to learn about BrokerData and broker::data types. I don't think perf is necessarily an argument here and worst case the ListVal can be of size 1 and hold a StringVal which then may contain a custom and more efficient encoding.
  • Figure out any deserialization: The old implementation wraps into Broker::Data, but this isn't easily workable downstream today, so maybe just ignore, or try to free-style decode into vector of any, table[any] of any, and set[any]. I just don't see how that's useful to script writers.
  • Add fuzzing

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions