[NetKAT] Eliminate node and map copies in packet transformer combinators by smolkaj · Pull Request #98 · google/netkat

smolkaj · 2026-06-10T10:14:48Z

What

Packet transformer compilation gets roughly 1.8× faster (BM_FirstTimeCompileNonOverlappingPolicy; ~1.4× on BM_FirstTimeCompileOverlappingPolicy) by eliminating the pervasive copying in the Union/Sequence/Difference combinators — the overhead already flagged by the TODO(dilo)s on GetMapAtValue and the node-level helpers. The win should grow on policies with wider match/modification maps, since the deleted copies scaled with map size.

The PR also makes the golden diff test robust to internal refactorings: golden output now depends only on transformer structure, not on node-interning order.

How

Commit 1 — copy elimination. The combinators previously copied entire DecisionNodes (nested btree maps and all) at every recursion step, re-interned expanded nodes through the unique table just to recover handles the caller already had, and copied the per-value modification map (GetMapAtValue) inside per-value loops. All of that is replaced with cheap, non-owning views:

DecisionNodeView views an operand "at" a field: either the node's own maps, or the trivial "fall through to this handle" expansion when the operand branches on a strictly larger field (or is Accept). The old code materialized that expansion as a real DecisionNode and recursed; now each op is a single handle-level function and the expansion costs two pointers.
ModifyBranchesView replaces GetMapAtValue's map copies with a view of the base map plus at most one extra entry (the unmodified fall-through), visited base-then-extra via a simple ForEach.
CombineModifyBranches is now a template over the combiner — no absl::AnyInvocable type-erasure, no contains+at double lookups, hinted insertion since keys arrive sorted.

Commit 2 — follow-ups.

Sequence accumulated its result maps by rebuilding them from scratch per left-branch entry (quadratic in branch count). It now unions each sequenced branch into the result map in place, which also deletes the remaining intermediate maps.
Union and Difference were near-identical after the refactor; their shared body lives in a PointwiseCombine template parameterized on the operation.
The golden diff test asserted raw interning indices, so any internal reordering (including this PR, and future work like operation memoization) churned the golden file without semantic change. The test runner now renumbers nodes and fields canonically before printing, by copying each compiled transformer into a fresh manager in deterministic traversal order. The golden file is regenerated accordingly.

Non-obvious choices

ViewAtField/SmallestField take the (private) DecisionNode as a deduced template parameter so they can live in the anonymous namespace without widening the class interface.
The per-value loops iterate values in sorted order (SortedUniqueKeys) instead of nondeterministic flat_hash_set order, making node-creation order deterministic.
The canonicalizing copy in the test runner pre-interns fields in their original relative order, since the node invariant "fields increase strictly along each path" depends on it.
NodeToTransformer canonicalization is intentionally untouched; it has its own inefficiencies, but mixing canonicalization changes into this refactor would have muddied verification.

Verification

packet_transformer_test (including fuzz-property tests against reference implementations), packet_set_test, analysis_engine_test, table_test, evaluator_test, switch_test, and the full //netkat:all suite: all pass.
After commit 1, the golden file was byte-identical, confirming the view refactor preserved behavior exactly (down to interning order). Commit 2's golden regeneration was verified to be a pure renumbering: every changed line is a handle number; structure, fields, and labels are identical.
Benchmarks: interleaved A/B runs, old and new binaries back to back; the new build won every round.

🤖 Generated with Claude Code

Commit 3 — review cleanups. A four-angle cleanup review (reuse, simplification, efficiency, altitude) replaced the view's hand-rolled merging iterator with a simple ForEach, deduplicated the operand-view prologue into ViewOperandsAtSmallestField, switched the empty-map singletons to absl::NoDestructor, named the Deny-handle contract in one place (IsDenyHandle), hinted the per-value btree insertions, and removed a stale TODO plus the unused any_invocable dependency. No behavior change; golden file unchanged; benchmarks neutral.

…ors. Union, Sequence, and Difference copied entire DecisionNodes — including their nested btree maps — at every recursion step, re-interned expanded nodes through the unique table just to recover handles the caller already had, and copied the per-value modification map (GetMapAtValue) inside per-value loops. Several TODOs flagged this as a likely performance problem. Restructure the combinators around cheap, non-owning views instead: * DecisionNodeView views an operand "at" a field: either the node's own maps, or the trivial "fall through to this handle" expansion when the node branches on a larger field (or is Accept). This removes both the by-value node copies and the materialize-and-re-intern expansion step. * ModifyBranchesView replaces GetMapAtValue's map copies with a view of the base map plus at most one extra entry, iterated in sorted order. * CombineModifyBranches becomes a template over the combiner, removing AnyInvocable type-erasure and double lookups, and inserts with an end hint since keys arrive sorted. * The value-collection loops now iterate in sorted order (previously nondeterministic flat_hash_set order), making combination order — and thus node numbering — deterministic. Compilation benchmarks improve by roughly 1.5-2.2x (FirstTimeCompile*), with larger wins expected on policies with wider modification maps. Behavior is unchanged: all tests pass, and the golden-file diff test output is byte-identical, including node numbering. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

google-cla · 2026-06-10T10:15:06Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

…, stable goldens. Three follow-ups to the copy-elimination refactor: * Sequence accumulated its result maps by rebuilding them from scratch for every left-branch entry, making accumulation quadratic in the number of branches. It now unions each sequenced branch into the result map in place, which also removes the remaining intermediate maps entirely. * Union and Difference were near-identical after the refactor; their shared body (pointwise combination of two operands viewed at a common field) moves into a PointwiseCombine template parameterized on the operation. * The golden diff test asserted raw node-interning indices, so any reordering inside the manager — including the change above, and planned work like operation memoization — churned the golden file without any semantic change. The test runner now renumbers nodes and fields canonically before printing, by copying each compiled transformer into a fresh manager in deterministic traversal order, so the golden output depends only on transformer structure. The golden file is regenerated accordingly (verified: the previous diff was pure renumbering, with structure, fields, and labels identical). Compilation benchmarks improve a further ~9% on top of the previous commit (~1.8x total vs the original code on FirstTimeCompile NonOverlappingPolicy). All tests pass. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Findings from a four-angle cleanup review (reuse, simplification, efficiency, altitude): * Replace ModifyBranchesView's hand-rolled merging iterator (~45 lines, the subtlest code in the refactor) with a simple ForEach that visits base entries then the extra entry. No caller needed globally sorted iteration: map contents are order-independent and handle-level results are canonical regardless of combination order. * Deduplicate the operand-view prologue shared by Sequence and PointwiseCombine into ViewOperandsAtSmallestField. * Use absl::NoDestructor for the empty-map singletons (the repo's established idiom) and name the default-handle-is-Deny contract in one place (IsDenyHandle) instead of re-deriving it inline. * Hint the per-value btree insertions in Sequence/PointwiseCombine with end(), since SortedUniqueKeys emits values in increasing order. * Test runner: collect fields into a btree_set instead of sort+unique, and translate field handles through a precomputed map instead of a per-node string round-trip. * Drop a stale TODO referencing the deleted GetMapAtValue, and the now-unused any_invocable dependency. No behavior change: all tests pass and the golden file is unchanged. Benchmarks are neutral. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

… combinators. Reworks the flat-node representation change to stack on top of the view-based combinator refactor (google#98), composing the two designs into the intended end state: * Interned nodes are flat (two contiguous sorted arrays, 40 bytes, two allocations) and stored exactly once, in a unique table keyed by stable pointers into the node arena with transparent lookup-by-builder (and the same table design ported to PacketSetManager, where nodes are even more numerous). * The combinators keep google#98's structure but their operand views now read flat spans: DecisionNodeView wraps a flat node (or the trivial fall-through), ModifyBranchesView is a span plus at most one extra entry, and per-value lookups are binary searches. Only result accumulation still uses btree maps (DecisionNodeBuilder), which NodeToTransformer canonicalizes and flattens on interning. * The builder-expansion machinery from the previous iteration of this branch (ToBuilder on every combinator call, handle-level field alignment) is gone — operands are never expanded, completing the deletion that google#98's views started. Also carries over from the previous iteration: the canonical flat-sequence visitor (ForEachFlatEntry) shared by Flatten/NodeHash/ NodeEq, streaming hashing, the Matches() iteration API, strengthened internal invariants (flat-encoding structure, hash/eq consistency, pointer-table integrity), the Push/Pull read-path benchmark, and the ProtoHashKey.policy_case hash fix. Verified structure-preserving: the golden file of the canonicalizing diff test is byte-identical to google#98's. All 17 test targets pass. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

… tables. The unique tables of both managers stored every node twice — once in the node arena (nodes_) and once as the hash map key — doubling node memory and copying each new node (nested btree maps and all, for transformer nodes) on insertion. The tables are now keyed by pointers into the arena, which PagedStableVector keeps stable across both growth and manager moves; transparent hash/equality functors allow lookup by node value before interning. Benchmarks improve ~2-9% across the board on top of google#98, mostly from no longer copying freshly-created nodes into the key slot. Also: * Adds BM_PushAndPullFullSetThroughPolicy, covering the read-path operations (Push/Pull) that the analysis engine is built on; the existing benchmarks only measured compilation. Sizing note: without memoization, Push/Pull cost grows exponentially in policy size (8 unioned sub-policies ran for 20+ minutes), so the benchmark deliberately stays small — and the planned operation-memoization work is where that changes. * Includes ProtoHashKey.policy_case in the proto-cache hash; it was part of equality but omitted from the hash, causing avoidable collisions across policy kinds. * CheckInternalInvariants verifies pointer-table integrity and exercises both lookup paths in both managers. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

smolkaj mentioned this pull request Jun 10, 2026

[NetKAT] Add scale benchmark with realistically large inputs #103

Draft

smolkaj mentioned this pull request Jun 10, 2026

[NetKAT] Store each interned decision node once: pointer-keyed unique tables #99

Draft

smolkaj mentioned this pull request Jun 10, 2026

[NetKAT] [PARKED] Flat array representation for interned PacketTransformer DecisionNodes #104

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NetKAT] Eliminate node and map copies in packet transformer combinators#98

[NetKAT] Eliminate node and map copies in packet transformer combinators#98
smolkaj wants to merge 3 commits into
google:mainfrom
smolkaj:transformer-copy-elimination

smolkaj commented Jun 10, 2026 •

edited

Loading

Uh oh!

google-cla Bot commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

smolkaj commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

How

Non-obvious choices

Verification

Uh oh!

google-cla Bot commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

smolkaj commented Jun 10, 2026 •

edited

Loading