[NetKAT] Intern decision nodes via per-field unique tables keyed by node index by smolkaj · Pull Request #100 · google/netkat

smolkaj · 2026-06-10T10:31:22Z

What

Cuts node-related memory roughly in half and reorganizes the unique tables (hash-cons tables) of PacketSetManager and PacketTransformerManager by packet field, as a first step toward level-based node storage (the standard organization in BDD packages, where it improves locality and enables variable reordering).

Before: packet_by_node_ / transformer_by_node_ were flat_hash_map<DecisionNode, Handle> — every interned node was stored twice: once in nodes_, once as the map key. For PacketTransformerManager the key copy was especially costly, since each node owns heap-allocated btree maps.

After: one flat_hash_set<uint32_t> per packet field, storing node indices; hashing and equality dereference into nodes_. Each node is stored exactly once. Per-field tables also keep probe sequences short and drop the field from hashing entirely (it's implied by the table).

Design notes

Interning probes before appending. NodeToPacket/NodeToTransformer look up the candidate node via heterogeneous lookup (is_transparent functors), so the common already-interned case does no copies and no allocations. Only genuinely new nodes touch nodes_.
Manager moves. The table functors must reference nodes_, which moves with the manager. They do so through a heap-allocated location slot owned by the manager; the (now manual) move operations repoint the slot. This keeps moves O(1) — the alternative of rebuilding tables on move would be O(#nodes).

Benchmarks

Flat overall: first-time compilation unchanged, recompile microbenchmarks +~50ns (~8%) from the added indirection in the unique tables — in exchange for the memory halving.

Benchmark	main	this PR
BM_FirstTimeCompileNonOverlappingPredicate	44.7µs	44.6µs
BM_ReCompileNonOverlappingPredicate	649ns	703ns
BM_FirstTimeCompileOverlappingPredicate	10.9µs	10.5µs
BM_ReCompileOverlappingPredicate	638ns	691ns
BM_FirstTimeCompileNonOverlappingPolicy	285.8µs	277.9µs
BM_ReCompileNonOverlappingPolicy	4.19µs	4.16µs
BM_FirstTimeCompileOverlappingPolicy	45.5µs	43.4µs
BM_ReCompileOverlappingPolicy	4.07µs	4.14µs

Testing

All 17 bazel test targets pass, including the golden diff tests (node numbering is unchanged by this PR) and the fuzz tests comparing against reference implementations.

Two related PRs: the arena page-size fix is split out as an independent one-line win, and the originally-stacked level-packed-handles work (#102) is deferred until operation memoization lands, per the scaling data on that PR.

🤖 Generated with Claude Code

…ode index. Previously, the unique tables (packet_by_node_, transformer_by_node_) stored a full copy of every decision node as the map key, duplicating all node storage. This was especially costly for PacketTransformerManager, whose nodes own heap-allocated btree maps. Now the tables store 4-byte node indices and hash/compare by dereferencing into nodes_, so each node is stored exactly once. Splitting the tables by packet field keeps each table - and thus its probe sequences - small, and lays the groundwork for storing the nodes themselves by field/level (standard practice in BDD packages, where it improves locality and enables variable reordering). A follow-up change builds on this. Implementation notes: * Interning probes the table with the candidate node via heterogeneous lookup (no node is added to nodes_ unless it is new), keeping the common already-interned case copy- and allocation-free. * The table functors reference nodes_ through a heap-allocated location slot so that they survive manager moves; move operations repoint the slot. Benchmarks are flat overall: first-time compilation is unchanged, recompile microbenchmarks regress by ~8% (~50ns) due to the indirection in the unique tables, in exchange for halving node-related memory. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

google-cla · 2026-06-10T10:31:39Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

smolkaj · 2026-06-10T11:05:00Z

Measured this PR stack on the new scale benchmark from #103 (realistically large inputs; medians of 3, -c opt, same machine, back-to-back):

Benchmark	main	#100	#102 (stack)
CompileForwardingTable/64	513µs	427µs	408µs (−20%)
CompileForwardingTable/512	34.4ms	27.9ms	27.7ms (−19%)
CompileForwardingTable/4096	2.98s	2.76s	2.72s (−9%)
CompileAclAsPacketSet/16	45.3µs	37.4µs	36.9µs (−19%)
CompileAclAsPacketSet/64	427µs	383µs	381µs (−11%)
CompileAclAsPacketSet/256	5.60ms	5.42ms	5.53ms (−1%)
CompileAclDifference/16	89.4µs	70.1µs	70.2µs (−21%)
CompileAclDifference/256	11.0ms	11.1ms	11.3ms (+2%)
CompileRingReachability/2	39.9µs	36.6µs	23.1µs (−42%)
CompileRingReachability/8	2.12ms	2.07ms	2.04ms (−4%)
CompileRingReachability/32	414ms	411ms	409ms (−1%)

Two takeaways:

The stack helps consistently at small/medium scale (−10–42%), and [NetKAT] Intern decision nodes via per-field unique tables keyed by node index #100's node-copy elimination shows up much more here than in the micro-benchmarks — interning wide forwarding-table nodes previously copied a heap-backed branch array per duplicate probe.
At the largest sizes the gains wash out (−1–9%): runtime is dominated by the known algorithmic issues (no operation memoization, b/382379263; map copies in the transformer combinators). That's the expected division of labor — this stack is the memory/locality/constant-factor layer; the asymptotic wins need the memoization work, which [NetKAT] Add scale benchmark with realistically large inputs #103 now makes measurable.

smolkaj · 2026-06-10T11:07:54Z

Folding this into #102 per offline discussion — the two changes are tightly coupled (the follow-up rewrites the regions this PR introduces), and GitHub can't express fork-based stacking cleanly, so #102 already showed the combined diff. #102 now contains both commits and the combined description.

smolkaj · 2026-06-10T15:56:02Z

Reopening: we measured the full stack (#102) on the scale benchmark (#103) and decided to land this layer separately after all — see the re-scope rationale on #102. This PR is unchanged and independently valuable (halves node-related memory, −7–20% on table-heavy compiles at scale).

This was referenced Jun 10, 2026

[NetKAT] Store decision nodes by field: per-field unique tables + level-packed handles #102

Closed

[NetKAT] Add scale benchmark with realistically large inputs #103

Draft

smolkaj closed this Jun 10, 2026

smolkaj reopened this Jun 10, 2026

smolkaj mentioned this pull request Jun 10, 2026

[NetKAT] Shrink node storage pages from 64 MiB to 16 KiB #105

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NetKAT] Intern decision nodes via per-field unique tables keyed by node index#100

[NetKAT] Intern decision nodes via per-field unique tables keyed by node index#100
smolkaj wants to merge 1 commit into
google:mainfrom
smolkaj:per-field-unique-tables

smolkaj commented Jun 10, 2026 •

edited

Loading

Uh oh!

google-cla Bot commented Jun 10, 2026

Uh oh!

smolkaj commented Jun 10, 2026

Uh oh!

smolkaj commented Jun 10, 2026

Uh oh!

smolkaj commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

smolkaj commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Design notes

Benchmarks

Testing

Uh oh!

google-cla Bot commented Jun 10, 2026

Uh oh!

smolkaj commented Jun 10, 2026

Uh oh!

smolkaj commented Jun 10, 2026

Uh oh!

smolkaj commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

smolkaj commented Jun 10, 2026 •

edited

Loading