Skip to content

Conversation

@dylandreimerink
Copy link
Member

@dylandreimerink dylandreimerink commented Nov 21, 2025

This PR introduces a deduplication mechanism for BTF types.

Most of the time BTF types originate from a single spec where a compiler or other external tool has already ensured that types are unique. In such cases we can simply rely on pointer equality to determine if two types are the same.

When dealing with manually generated BTF or merging multiple BTF specs, duplicate types are common. Meaning we have multiple different go objects which represent the same underlying BTF type.

It is useful to be able to deduplicate these types, both to reduce the size of the resulting BTF, as well as allowing name based lookups after combining multiple specs with duplicate types.

The deduplication algorithm is loosely based on the one used in libbpf. This version for example does not do FWD type resolution, as that is only needed when combining BTF from multiple compilation units, which is something typically not seen in eBPF use-cases (only pahole). In the libbpf implementation the first step is string deduplication, however, we do this step during marshaling, and thus we do not deduplicate strings in the Go representation.

When a type is deduplicated, we try to deduplicate not just that root type, but the full subtree of types reachable from that type. We start by traversing all types in post-order, and any time there is an edge we try to replace that child with an equal type we have already seen.

Comparing every type with those seen before would be very expensive. So what we do is we compute a hash of each type. The hash is an approximation of all properties of the type, including recursively hashing child types. When using this hash as key in a map we end up with a set of candidate types which might be equal to the type we are currently deduplicating. We still need to do a full equality check to be sure two types are equal, both to avoid hash collisions as well as to compare properties which are not included in the hash (recursion limit). In practically every case a hash narrows down to 0 or 1 candidate types.

Once we have narrowed down to candidate types, we do a full equality check in which we walk the two types to be compared together in depth first manner, and bail out as soon as we find a difference. Since types can form cycles with pointers, we keep track of already visited types in the current equality check, and assume types are equal.

This deduplication mechanism can be used via a standalone function, but is also integrated in the BTF spec builder via a new method to add and deduplicate a types.

The implementation as it stands performs reasonable for its intended application:

goos: linux
goarch: amd64
pkg: github.com/cilium/ebpf/btf
cpu: 13th Gen Intel(R) Core(TM) i7-13800H
                      │   sec/op    │
DeduplicateSKBuff-20    9.253m ± 2%
DeduplicateVMLinux-20   103.5m ± 9%

                      │     B/op      │
DeduplicateSKBuff-20    2.744Mi ±  1%
DeduplicateVMLinux-20   30.02Mi ± 16%

                      │  allocs/op   │
DeduplicateSKBuff-20     318.0 ±  1%
DeduplicateVMLinux-20   8.065k ± 18%

So ~9ms for struct sk_buff which has ~5800 types, and ~103ms for vmlinux which has ~93000 types. For smaller types such as map key and value types the cost of obviously way cheaper.

This commit introduces a deduplication mechanism for BTF types.

Most of the time BTF types originate from a single spec where a compiler
or other external tool has already ensured that types are unique. In
such cases we can simply rely on pointer equality to determine if two
types are the same.

When dealing with manually generated BTF or merging multiple BTF
specs, duplicate types are common. Meaning we have multiple different
go objects which represent the same underlying BTF type.

It is useful to be able to deduplicate these types, both to reduce
the size of the resulting BTF, as well as allowing name based lookups
after combining multiple specs with duplicate types.

The deduplication algorithm is loosely based on the one used in libbpf.
This version for example does not do FWD type resolution, as that is
only needed when combining BTF from multiple compilation units, which
is something typically not seen in eBPF use-cases (only pahole).
In the libbpf implementation the first step is string deduplication,
however, we do this step during marshaling, and thus we do not
deduplicate strings in the Go representation.

When a type is deduplicated, we try to deduplicate not just that root
type, but the full subtree of types reachable from that type. We start
by traversing all types in post-order, and any time there is an edge
we try to replace that child with an equal type we have already seen.

Comparing every type with those seen before would be very expensive.
So what we do is we compute a hash of each type. The hash is an
approximation of all properties of the type, including recursively
hashing child types. When using this hash as key in a map we end up
with a set of candidate types which might be equal to the type we
are currently deduplicating. We still need to do a full equality check
to be sure two types are equal, both to avoid hash collisions as well
as to compare properties which are not included in the hash (recursion
limit). In practically every case a hash narrows down to 0 or 1
candidate types.

Once we have narrowed down to candidate types, we do a full equality
check in which we walk the two types to be compared together in depth
first manner, and bail out as soon as we find a difference.
Since types can form cycles with pointers, we keep track of already
visited types in the current equality check, and assume types are equal.

This deduplication mechanism can be used via a standalone function,
but is also integrated in the BTF spec builder via a new method to
add and deduplicate a types.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
@dylandreimerink dylandreimerink marked this pull request as ready for review November 24, 2025 13:28
@dylandreimerink dylandreimerink requested review from lmb and ti-mo November 24, 2025 13:28
Copy link
Collaborator

@lmb lmb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat feature, looking forward to this!

"slices"
)

func Deduplicate(roots []Type) ([]Type, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a doc comment.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this even used at the moment? Maybe just remove it outright if the use case is via the builder.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can get rid of it, minimize the API surface. Its was useful while testings, but we can just call the unexported function to do that.


// maphash.Hash by default uses a random seed per Hash instance.
// To ensure consistent hashing across multiple calls, we use a fixed seed for the lifetime of the program.
var seed = maphash.MakeSeed()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should store maphash.Hash and Seed in deduper instead. That avoids temporarily allocating Hash (which is big-ish) and gets rid of global state, which is always a bit of a smell.

depthBudget int
}

// typeHash computes a hash for `t`. The produced hash is the same for types which are similar.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "similar" mean here? equivalent?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S. It took me a long time to figure out what this really does: produce a hash of the first couple of levels of indirection, as an approximation of equivalence. This is a really neat idea, I wonder whether we can simplify it even more.

Right now depth is changed dynamically based on what kind of type we are looking at. Sometimes we use depth 1, sometimes 2. This trades off how much work we do. At the same time it avoids infinite recursion for self referential types. I think instead of having some dynamic level we could always use a fixed budget. Basically make this the hash of the first N levels of indirection. If a type is self referential we just end up hashing the same data up to N times, which isn't a problem as long as N is something reasonably small.

This has the nice side effect that hash(T) now doesn't change anymore, and the cache can become map[Type]uint64. We might also be able to move the cache into the caller. Plus the code is a bit easier to understand.

Copy link
Member Author

@dylandreimerink dylandreimerink Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as you might imagine this is where I started, a static budget. The issue is that in select cases, specifically double points and function prototypes you often end up with hash collisions since the first levels of these are often similar, causing a lot of work in equality checks. Increasing the budget overall fixes that, but it also makes the hash way more expensive, canceling out the benefit. Which is why I settled on the current state, adding depth where it reduces collisions going from ~100 (in double pointers and func protos) to <10 collisions on all types in vmlinux.

So we can simplify at the cost of some performance, I can run the numbers so we can quantify.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the numbers when we swap out the dynamic depth limit for a static limit of 1, 2 or 3:

goos: linux
goarch: amd64
pkg: github.com/cilium/ebpf/btf
cpu: 13th Gen Intel(R) Core(TM) i7-13800H
                      │ dynamic-depth.txt │             depth-1.txt              │             depth-2.txt              │             depth-3.txt              │
                      │      sec/op       │    sec/op     vs base                │    sec/op     vs base                │    sec/op     vs base                │
DeduplicateSKBuff-20          8.290m ± 4%   10.035m ± 5%  +21.04% (p=0.000 n=10)   9.624m ± 10%  +16.08% (p=0.000 n=10)   12.523m ± 5%  +51.05% (p=0.000 n=10)
DeduplicateVMLinux-20         95.49m ± 4%   123.51m ± 3%  +29.35% (p=0.000 n=10)   96.37m ± 10%        ~ (p=0.315 n=10)    98.56m ± 2%   +3.22% (p=0.004 n=10)
geomean                       28.14m         35.21m       +25.12%                  30.45m         +8.24%                   35.13m       +24.86%

                      │ dynamic-depth.txt │             depth-1.txt              │              depth-2.txt              │              depth-3.txt              │
                      │       B/op        │     B/op       vs base               │     B/op       vs base                │     B/op       vs base                │
DeduplicateSKBuff-20        2.505Mi ±  4%   2.559Mi ± 17%  +2.17% (p=0.014 n=10)   2.804Mi ± 19%  +11.92% (p=0.000 n=10)   3.770Mi ± 17%  +50.48% (p=0.000 n=10)
DeduplicateVMLinux-20       29.06Mi ± 10%   29.74Mi ±  0%       ~ (p=0.089 n=10)   30.99Mi ±  7%   +6.63% (p=0.014 n=10)   31.99Mi ±  3%  +10.09% (p=0.007 n=10)
geomean                     8.533Mi         8.724Mi        +2.24%                  9.321Mi         +9.24%                  10.98Mi        +28.71%

                      │ dynamic-depth.txt │             depth-1.txt              │             depth-2.txt              │             depth-3.txt             │
                      │     allocs/op     │  allocs/op    vs base                │  allocs/op    vs base                │  allocs/op   vs base                │
DeduplicateSKBuff-20          298.0 ±  3%     320.0 ± 3%   +7.38% (p=0.000 n=10)    325.0 ± 12%   +9.06% (p=0.000 n=10)    387.0 ± 8%  +29.87% (p=0.000 n=10)
DeduplicateVMLinux-20        7.081k ± 10%   12.128k ± 0%  +71.28% (p=0.000 n=10)   8.315k ±  7%  +17.43% (p=0.000 n=10)   7.968k ± 4%  +12.53% (p=0.001 n=10)
geomean                      1.453k          1.970k       +35.62%                  1.644k        +13.17%                  1.756k       +20.89%

It seems a static depth of 2 is the best alternative at ~10% slower.

return cached
}
defer func() {
cache[key] = sum
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: neat but brittle due to variable shadowing. I'd probably just do this explicitly at the bottom of the method, since there is only a single return statement.

h := &hash
h.SetSeed(seed)

switch t := t.(type) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem with this approach is that it is very easy to forget to implement this for new types. Could we instead do something like the following?

type Type interface{
	// ...

	// hash all comparable fields of the type, excluding any [Type]s.
	hash(*maphash.Hash)
}

func hashType(h *Hash, t Type, budget int) {
	if budget < 0 {
		return
	}

	// Write all non-Type fields to h. Since we don't deal with Type there is no recursion.
	t.hash(h)

	for _, c := range children(t) {
		hashType(h, *c, budget-1)
	}
}

func typeHash(h *Hash, s *Seed t *Type) (uint64) {
	h.SetSeed(s)
	hashType(h, t, N)
	return h.Sum64()
}

(Assumes that caching of typeHash happens in the caller.)

Another upside is that this could let us add a test in testType or whatever that is called which asserts some behaviour of hash.

}

// Check the cache to avoid recomputing the hash for this type and depth budget.
key := hashCacheKey{t, depthBudget}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it really pay off to put all types into the cache? Intuitively I'd expected we'd only cache composite types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe it does, I recall testing just composites and this was better. I will double check and post the numbers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out just caching composites is better, so will do that instead:

goos: linux
goarch: amd64
pkg: github.com/cilium/ebpf/btf
cpu: 13th Gen Intel(R) Core(TM) i7-13800H
                      │ cache-all.txt │        cache-composites.txt         │
                      │    sec/op     │   sec/op     vs base                │
DeduplicateSKBuff-20      8.322m ± 1%   8.045m ± 0%   -3.33% (p=0.000 n=30)
DeduplicateVMLinux-20     98.95m ± 2%   85.39m ± 1%  -13.71% (p=0.000 n=30)
geomean                   28.70m        26.21m        -8.67%

                      │ cache-all.txt │         cache-composites.txt         │
                      │     B/op      │     B/op      vs base                │
DeduplicateSKBuff-20     2.673Mi ± 1%   1.725Mi ± 1%  -35.46% (p=0.000 n=30)
DeduplicateVMLinux-20    30.98Mi ± 6%   22.27Mi ± 3%  -28.12% (p=0.000 n=30)
geomean                  9.100Mi        6.198Mi       -31.89%

                      │ cache-all.txt │        cache-composites.txt         │
                      │   allocs/op   │  allocs/op   vs base                │
DeduplicateSKBuff-20       311.0 ± 0%    260.0 ± 1%  -16.40% (p=0.000 n=30)
DeduplicateVMLinux-20     8.314k ± 6%   8.097k ± 3%        ~ (p=0.059 n=30)
geomean                   1.608k        1.451k        -9.77%

return dm.deduplicateSingle(t)
}

func (dm *deduper) deduplicateSingle(t Type) Type {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think you can merge Deduplicate and deduplicateSingle?

return true
}

switch a := aTyp.(type) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar feelings to this switch as the one above, it's very easy to forget to implement things. Not sure how to fix this though. The core code looks veeery similar, I'll look at that and maybe I can find a common abstraction.


// Detect cycles by tracking visited types. Assume types are equal if we have already
// visited this type in the current equality check.
if slices.Contains(visited, aTyp) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to get quite expensive pretty fast, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its really depends. Turns out CPUs are quite fast at checking this when its in contiguous memory, since its a predictable access pattern and cache lines ect. So for small-ish slices, its faster than a map, both the lookup and writing. Doing this only for pointers and not all types really helps. I can also double check and re-measure.

return a.Value == b.Value &&
typesEqual(a.Type, b.Type, visited, cache)
default:
panic(fmt.Sprintf("unsupported type for equality: %T", a))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how I feel about this panic as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants