Creating a ticket to track the work and mark it for 9.0 or 9.1.
The BROKER_BIN_V1 serializer in cluster/serializer/broker today delegates the encoding work to the broker library.
This means it first converts the zeek::cluster::Event to a broker::zeek::Event type which also requires converting all Val instances to broker::data via the val_to_data() helper in broker/Data.h.
This presents a dependency on Broker and blocks #2476. We'd prefer to have the serialization code within Zeek directly, which would also allow to apply optimizations on just the Val or ZVal layer.
I've started to implement a "sans" broker version of the serializer that'll ideally be binary compatible. It works directly on the zeek::Val representation stored in zeek::cluster::Event and renders into a byte buffer, bypassing the intermediate broker::data representation.
WIP branch:
https://github.com/zeek/zeek/tree/topic/awelzel/reimplement-broker-bin-v1
It's still in progress, but some initial benchmark indicates this is promising also from a performance/efficiency angle. Basic serialization and deserialization of a Cluster::node_up event with its two string arguments look as if they get ~3x speedup.
$ taskset -c 1,2,3 ./build/src/benchmarks/cluster/event-serializer
[...]
---------------------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------------------
BM_Serialize_NodeUp_SansBroker 403 ns 403 ns 1768801
BM_Serialize_NodeUp_Broker 1250 ns 1250 ns 568050
BM_Unserialize_NodeUp_SansBroker 226 ns 226 ns 3083203
BM_Unserialize_NodeUp_Broker 857 ns 857 ns 818552
Some more staring indicates it'd be useful to work on record and vector elements via the embedded ZVal instances directly, but skipping this for now as val_to_data() doesn't do that either.
TODO / Discussions
Creating a ticket to track the work and mark it for 9.0 or 9.1.
The
BROKER_BIN_V1serializer in cluster/serializer/broker today delegates the encoding work to the broker library.This means it first converts the
zeek::cluster::Eventto abroker::zeek::Eventtype which also requires converting allValinstances tobroker::datavia theval_to_data()helper inbroker/Data.h.This presents a dependency on Broker and blocks #2476. We'd prefer to have the serialization code within Zeek directly, which would also allow to apply optimizations on just the Val or ZVal layer.
I've started to implement a "sans" broker version of the serializer that'll ideally be binary compatible. It works directly on the
zeek::Valrepresentation stored inzeek::cluster::Eventand renders into a byte buffer, bypassing the intermediatebroker::datarepresentation.WIP branch:
https://github.com/zeek/zeek/tree/topic/awelzel/reimplement-broker-bin-v1
It's still in progress, but some initial benchmark indicates this is promising also from a performance/efficiency angle. Basic serialization and deserialization of a
Cluster::node_upevent with its two string arguments look as if they get ~3x speedup.Some more staring indicates it'd be useful to work on record and vector elements via the embedded
ZValinstances directly, but skipping this for now asval_to_data()doesn't do that either.TODO / Discussions
SerializeData()of OpaqueVal. Current idea is to make the contract such that anOpaqueValthat is serializable for cluster transport should render into an intermediateListValof atomic types rather than requiring users to learn about BrokerData andbroker::datatypes. I don't think perf is necessarily an argument here and worst case theListValcan be of size 1 and hold aStringValwhich then may contain a custom and more efficient encoding.anydeserialization: The old implementation wraps intoBroker::Data, but this isn't easily workable downstream today, so maybe just ignore, or try to free-style decode intovector of any,table[any] of any, andset[any]. I just don't see how that's useful to script writers.