SlateDB supports prefix bloom filters, and scan_prefix can use them to skip SSTs that do not contain the requested prefix. However, normal bounded range scans do not currently leverage prefix bloom filters, even when the range clearly targets a prefix-shaped keyspace.
For example, a scan like:
db.scan(b"user:123:"..b"user:124:")
may be semantically similar to a prefix scan over user:123:, but today it goes through the normal range-scan path and does not pass a prefix query to SST filter evaluation. This can make range scans over prefix-partitioned keys scale with the number of overlapping SSTs even when configured prefix bloom filters could rule many of them out.
This is related to the broader scan latency problem discussed in #1302 and the prefix filtering work in #1334, but the remaining gap is specifically that scan / scan_with_options cannot benefit from prefix bloom filters.
Possible directions:
- Add an explicit scan option that lets callers provide a prefix-filter hint for a range scan, with validation that the requested range is contained within that prefix range.
- Explore safe inference for ranges whose bounds unambiguously describe a single prefix interval, while preserving the current non-inference behavior where ambiguous.
The goal would be to let users express prefix-shaped bounded scans without losing the SST pruning benefits of prefix bloom filters.
SlateDB supports prefix bloom filters, and
scan_prefixcan use them to skip SSTs that do not contain the requested prefix. However, normal bounded range scans do not currently leverage prefix bloom filters, even when the range clearly targets a prefix-shaped keyspace.For example, a scan like:
may be semantically similar to a prefix scan over
user:123:, but today it goes through the normal range-scan path and does not pass a prefix query to SST filter evaluation. This can make range scans over prefix-partitioned keys scale with the number of overlapping SSTs even when configured prefix bloom filters could rule many of them out.This is related to the broader scan latency problem discussed in #1302 and the prefix filtering work in #1334, but the remaining gap is specifically that
scan/scan_with_optionscannot benefit from prefix bloom filters.Possible directions:
The goal would be to let users express prefix-shaped bounded scans without losing the SST pruning benefits of prefix bloom filters.