-
Notifications
You must be signed in to change notification settings - Fork 523
sstable/tieredmeta: add tiering histogram using t-digest #5650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
1a28649 to
c2d5a00
Compare
Potential Bug(s) DetectedThe three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation. Next Steps: Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary. After you review the findings, please tag the issue as follows:
|
5108683 to
9fce22e
Compare
|
I'll write the block in a separate PR |
Potential Bug(s) DetectedThe three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation. Next Steps: Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary. After you review the findings, please tag the issue as follows:
|
RaduBerinde
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!!
Reviewable status: 0 of 8 files reviewed, 14 unresolved discussions (waiting on @annrpom, @jbowens, and @sumeerbhola)
go.sum line 14 at r1 (raw file):
github.com/RaduBerinde/btreemap v0.0.0-20250419174037-3d62b7205d54 h1:bsU8Tzxr/PNz75ayvCnxKZWEYdLMPDkUgticP4a4Bvk= github.com/RaduBerinde/btreemap v0.0.0-20250419174037-3d62b7205d54/go.mod h1:0tr7FllbE9gJkHq7CVeeDDFAFKQVy5RnCSSNBOvdqbc= github.com/RaduBerinde/tdigest v0.0.0-20251022152254-90e030c3a314 h1:RcSNrxDZ1ZyEfpGwrpWqd3YWfMjQbKOtT+p3Wht6XN0=
We should apply the same due dilligence as any external library since it's a forked library and my changes to it weren't reviewed. Maybe @sumeerbhola or @jbowens can take a brief look over it?
sstable/tieredmeta/histogram.go line 20 at r1 (raw file):
// 2*compression centroids, with ~1.3*compression in practice. // // A value of 100 is a common default for t-digest implementations.
, which should provide at least ±1% quantile accuracy (and much better at the tails).
sstable/tieredmeta/histogram.go line 21 at r1 (raw file):
// // A value of 100 is a common default for t-digest implementations. const CompressionLevel = 100
[nit] digestDelta
sstable/tieredmeta/histogram.go line 24 at r1 (raw file):
// StatsHistogram is a quantile sketch using t-digest to track the distribution // of bytes by TieringAttribute. It can efficiently answer questions like:
[nit] for bytes we could say "a bytes value (with a caller-determined meaning)"
sstable/tieredmeta/histogram.go line 38 at r1 (raw file):
// ZeroBytes tracks bytes with attribute=0 separately, as this typically // indicates unset/special values that shouldn't be tiered. ZeroBytes uint64
[nit] BytesNoAttr
sstable/tieredmeta/histogram.go line 54 at r1 (raw file):
} // Quantile returns the attribute value at quantile q (where q is in [0, 1]).
[nit] at quantile ~q
sstable/tieredmeta/histogram.go line 55 at r1 (raw file):
// Quantile returns the attribute value at quantile q (where q is in [0, 1]). // Example: Quantile(0.5) returns the median attribute value by bytes.
[nit] returns a value close to the median value
sstable/tieredmeta/histogram.go line 64 at r1 (raw file):
// NonZeroBytes returns the total bytes with non-zero attributes. func (s *StatsHistogram) NonZeroBytes() uint64 {
BytesWithAttr
sstable/tieredmeta/histogram.go line 68 at r1 (raw file):
} // BytesBelowThreshold estimates how many non-zero bytes have attribute <= threshold.
how many bytes (excluding NoAttrBytes)
sstable/tieredmeta/histogram_block.go line 49 at r1 (raw file):
// Below is the block structure: // // SSTable's TieringHistogramBlock:
IMO in sstables we should be writing only one tiering histogram, the one that matches the TieringID in the current policy. We can consider any KVs with other TierIDs as having "no attribute" when we populate this histogram. I don't think we would realistically see different TierIDs in the same keyspace anyway. WDYT @sumeerbhola?
sstable/tieredmeta/histogram_block.go line 58 at r1 (raw file):
// └── ... // // Blob file's TieringHistogramBlock:
@sumeerbhola remind me, when do we expect to use a blob's histogram block? I was thinking everything would be done from the sstable's perspective.
sstable/tieredmeta/histogram_block.go line 92 at r1 (raw file):
SSTableKeyBytes KindAndTier = iota // SSTableValueBytes is the histogram for values stored inside the // sstable. The tier is implicit based on whether the sstable is in hot or
[nit] I'd remove the last sentence since all sstables are in hot storage currently.
sstable/tieredmeta/histogram_block.go line 94 at r1 (raw file):
// sstable. The tier is implicit based on whether the sstable is in hot or // cold storage. SSTableValueBytes
SSTableInPlaceValueBytes
sstable/tieredmeta/histogram_block.go line 97 at r1 (raw file):
// SSTableBlobReferenceHotBytes is the histogram for blob references from the // sstable to hot tier blob files. The size is the size of the value. SSTableBlobReferenceHotBytes
I don't think we need SSTableBlobReferenceHotBytes and SSTableBlobReferenceColdBytes. We should just have SSTableBlobValueBytes which tracks the blob value length.
Separately (in the sstable properties and/or TableMetadata), we will store the cutoff that was used, so we'll be able to use BytesBelowThreshold and BytesAboveThreshold to figure out cold vs hot.
RaduBerinde
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 8 files reviewed, 15 unresolved discussions (waiting on @annrpom, @jbowens, and @sumeerbhola)
internal/base/internal.go line 643 at r1 (raw file):
type TieringAttribute uint64 // TieringSpanID TODO comment
TieringSpanID is an identifier that provides a context for interpreting a TieringAttribute value. At any point in time, the span policies determine a set of non-overlapping key regions with distinct TieringSpanIDs.
RaduBerinde
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 8 files reviewed, 16 unresolved discussions (waiting on @annrpom, @jbowens, and @sumeerbhola)
sstable/tieredmeta/histogram_block.go line 90 at r1 (raw file):
// SSTableKeyBytes is the histogram for sstable keys. The tier is implicit // based on whether the sstable is in hot or cold storage. SSTableKeyBytes KindAndTier = iota
It's unclear to me how we will use this. What we want is to estimate how much physical sstable data is there because of cold KVs. But the key size is just part of that, plus the keys compress against each other.
Maybe it would be easier/cleaner to just track just the KV count? And we'd estimate the above as roughly <sstable physical size> * <KV count with attribute below threshold> / <total KV count>
9fce22e to
1cd44c2
Compare
annrpom
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 8 files reviewed, 4 unresolved discussions (waiting on @jbowens, @RaduBerinde, and @sumeerbhola)
internal/base/internal.go line 643 at r1 (raw file):
Previously, RaduBerinde wrote…
TieringSpanID is an identifier that provides a context for interpreting a TieringAttribute value. At any point in time, the span policies determine a set of non-overlapping key regions with distinct TieringSpanIDs.
eek i forgot to update; ty!
sstable/tieredmeta/histogram.go line 20 at r1 (raw file):
Previously, RaduBerinde wrote…
, which should provide at least ±1% quantile accuracy (and much better at the tails).
done
sstable/tieredmeta/histogram.go line 21 at r1 (raw file):
Previously, RaduBerinde wrote…
[nit] digestDelta
done
sstable/tieredmeta/histogram.go line 24 at r1 (raw file):
Previously, RaduBerinde wrote…
[nit] for bytes we could say "a bytes value (with a caller-determined meaning)"
done
sstable/tieredmeta/histogram.go line 38 at r1 (raw file):
Previously, RaduBerinde wrote…
[nit] BytesNoAttr
done
sstable/tieredmeta/histogram.go line 54 at r1 (raw file):
Previously, RaduBerinde wrote…
[nit]
at quantile ~q
done
sstable/tieredmeta/histogram.go line 55 at r1 (raw file):
Previously, RaduBerinde wrote…
[nit] returns a value close to the median value
done
sstable/tieredmeta/histogram.go line 64 at r1 (raw file):
Previously, RaduBerinde wrote…
BytesWithAttr
done
sstable/tieredmeta/histogram.go line 68 at r1 (raw file):
Previously, RaduBerinde wrote…
how many bytes (excluding NoAttrBytes)
done
sstable/tieredmeta/histogram_block.go line 58 at r1 (raw file):
Previously, RaduBerinde wrote…
@sumeerbhola remind me, when do we expect to use a blob's histogram block? I was thinking everything would be done from the sstable's perspective.
looking at the prototype, the blob file's histogram block isn't currently used. the tiered storage design doc mentions writing it, but i'll leave this block out for now and can add it back if necessary
from offline convo: the primary use case for a blob histogram block would be observability
sstable/tieredmeta/histogram_block.go line 90 at r1 (raw file):
Previously, RaduBerinde wrote…
It's unclear to me how we will use this. What we want is to estimate how much physical sstable data is there because of cold KVs. But the key size is just part of that, plus the keys compress against each other.
Maybe it would be easier/cleaner to just track just the KV count? And we'd estimate the above as roughly
<sstable physical size> * <KV count with attribute below threshold> / <total KV count>
would we ever be way off if we were to track just the KV count? in practice, would we have some large variance of value bytes across an sstable that would cause us to severely overestimate or underestimate? maybe like: if we wrote some data, some time passes so that data turns cold, and then we write mvcc tombstones that have no value - won't we then underestimate cold data, since count-based tracking weights tombstones (no value, hot) equally with real KVs (large values, cold)?
sstable/tieredmeta/histogram_block.go line 92 at r1 (raw file):
Previously, RaduBerinde wrote…
[nit] I'd remove the last sentence since all sstables are in hot storage currently.
done
sstable/tieredmeta/histogram_block.go line 94 at r1 (raw file):
Previously, RaduBerinde wrote…
SSTableInPlaceValueBytes
done
sstable/tieredmeta/histogram_block.go line 97 at r1 (raw file):
Previously, RaduBerinde wrote…
I don't think we need
SSTableBlobReferenceHotBytesandSSTableBlobReferenceColdBytes. We should just haveSSTableBlobValueByteswhich tracks the blob value length.Separately (in the sstable properties and/or TableMetadata), we will store the cutoff that was used, so we'll be able to use
BytesBelowThresholdandBytesAboveThresholdto figure out cold vs hot.
ah yeah done
sumeerbhola
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sumeerbhola reviewed 8 files and all commit messages, and made 5 comments.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @annrpom, @jbowens, and @RaduBerinde).
sstable/tieredmeta/histogram_block.go line 90 at r1 (raw file):
Previously, annrpom (annie pompa) wrote…
would we ever be way off if we were to track just the KV count? in practice, would we have some large variance of value bytes across an sstable that would cause us to severely overestimate or underestimate? maybe like: if we wrote some data, some time passes so that data turns cold, and then we write mvcc tombstones that have no value - won't we then underestimate cold data, since count-based tracking weights tombstones (no value, hot) equally with real KVs (large values, cold)?
I'm also confused because the sstable physical size includes inline values.
Say we tracked the original non prefix compressed bytes in the histogram. And we also computed the sum of the key columns in all the data blocks in the sstable (say B). Then the estimate for cold key bytes with a cold threshold T could be (BytesBelowThreshold(T)/TotalBytes)*B.
sstable/tieredmeta/histogram_block.go line 72 at r2 (raw file):
// SSTableBlobValueBytes is the histogram for values stored in blob files // referenced by the sstable. SSTableBlobValueBytes
Now that some files will have hot references also have cold references, I was trying to think through what we need here and for what purpose. I was imagining the following categories:
- Value that has both cold reference and hot (in place in sstable, or reference): Useful for observability, for figuring out how much duplication we have. Not used for deciding whether to write cold blob files in a compaction.
- Value that has only cold reference: Again, only for observability
- Value that is only hot: Needed for deciding whether to write cold references before starting the compaction. The prototype further distinguished the ones that were a reference versus in place since the former definitely met the value size threshold to be a reference, while the latter could be very small. So for the latter, there was an estimate used in the condition
k.KindAndTier == base.SSTableValueBytes && h.MeanSize() < uint64(policy.MinimumSize)
The list in this PR is different than the above.
sstable/tieredmeta/histogram.go line 34 at r2 (raw file):
// multiple files without bucket alignment issues. type StatsHistogram struct { // TotalBytes is the sum of all bytes recorded (including ZeroBytes).
should this say BytesNoAttr instead of ZeroBytes?
sstable/tieredmeta/histogram.go line 56 at r2 (raw file):
} // Quantile returns the attribute value at quantile ~q (where q is in [0, 1]).
worth saying the quantile is of the non-zero bytes.
sstable/tieredmeta/histogram.go line 146 at r2 (raw file):
digestSize, len(buf)) }
is it ok to have digestSize < len(buf) or should that also be an error?
RaduBerinde
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RaduBerinde made 2 comments.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @annrpom, @jbowens, and @sumeerbhola).
sstable/tieredmeta/histogram_block.go line 90 at r1 (raw file):
I forgot about the in-place values. Indeed we would need a size that excludes that.
And we also computed the sum of the key columns in all the data blocks in the sstable (say B)
This would be the size of the encoded column as it appears in the block? That does make sense. Though I'm not sure if scaling according to the sum of key sizes is necessarily better than just by key count.
With the key count, we could also estimate the amount of other per-key data in the block (the tiering attribute, the blob reference).
I am suspecting that in many cases, the extra key storage itself might only be a few bytes per key (e.g. last few bytes of an ID), much less than the other per-key data in the sstable.
sstable/tieredmeta/histogram_block.go line 72 at r2 (raw file):
Previously, sumeerbhola wrote…
Now that some files will have hot references also have cold references, I was trying to think through what we need here and for what purpose. I was imagining the following categories:
- Value that has both cold reference and hot (in place in sstable, or reference): Useful for observability, for figuring out how much duplication we have. Not used for deciding whether to write cold blob files in a compaction.
- Value that has only cold reference: Again, only for observability
- Value that is only hot: Needed for deciding whether to write cold references before starting the compaction. The prototype further distinguished the ones that were a reference versus in place since the former definitely met the value size threshold to be a reference, while the latter could be very small. So for the latter, there was an estimate used in the condition
k.KindAndTier == base.SSTableValueBytes && h.MeanSize() < uint64(policy.MinimumSize)The list in this PR is different than the above.
I think we should choose either a design where (for the levels where we actually do tiering) all hot values are both in hot and cold, or one where all hot values are only in hot. Mixing them will lead to a lot of complications.
With that in mind, these can all be deduced from BlobValueBytes and the threshold used for this file.
IMO upper level files that aren't tiered wouldn't have this data at all. The cost-benefit is not there - these are normally ~1% of total size so statistics are not very valuable, but they get written much more frequently. At the same time, a single upper-level file can have keys from many different tables and we'd have to look up all each table's span config in every compaction.
sumeerbhola
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sumeerbhola made 1 comment.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @annrpom, @jbowens, and @RaduBerinde).
sstable/tieredmeta/histogram_block.go line 72 at r2 (raw file):
I think we should choose either a design where (for the levels where we actually do tiering) all hot values are both in hot and cold, or one where all hot values are only in hot. Mixing them will lead to a lot of complications.
Say we only were tiering in L6, which is the latest proposal. And we are compacting a 64MiB L5 file with some number of L6 files that sum to 640MiB. Say this compaction is not one that rewrites existing references. And some of those input L6 files were moved down from higher levels so don't have cold references.
Are we going to force writing of cold blob files? If yes, does this actually help with any invariant on L6 files given move compacted L6 files may only have hot references? Also, what if the L5 file happened to have only tiny values that don't get moved to a blob file so the cold blob file is also tiny -- if we don't write cold blob files yet, we could wait till we can write a bulkier cold blob file.
In my mind at least, the idea to have both cold and hot blob references for the same data was something to improve the average behavior of the system by bulking up the cold blob files and not an invariant that we needed to rely on. We already had fine grained KindAndTier in the prototype, and I don't think it complicated anything.
IMO upper level files that aren't tiered wouldn't have this data at all. The cost-benefit is not there - these are normally
~1%of total size so statistics are not very valuable, but they get written much more frequently. At the same time, a single upper-level file can have keys from many different tables and we'd have to look up all each table's span config in every compaction.
I really don't have an opinion on this.
Just one thing to note: upper level files were not even split based on TieringSpanID (let alone shard/range boundaries), in the prototype, since that can lead to smaller files than necessary. So one could not estimate how much data would have cold references written purely based on L5 sstable stats since there could be values in there with different tiering-span-ids (and some with no tiering policy). Just maintaining the same behavior across all files lead to less special casing of logic (which has its own simplicity). The sizes of these histograms don't really matter except for compactions, and I suspect we are a long way away from having 1000s of different tiered SQL tables in a cluster.
Anyway, I am fine with whatever y'all decide.
RaduBerinde
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RaduBerinde made 1 comment.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @annrpom, @jbowens, and @sumeerbhola).
sstable/tieredmeta/histogram_block.go line 72 at r2 (raw file):
Previously, sumeerbhola wrote…
I think we should choose either a design where (for the levels where we actually do tiering) all hot values are both in hot and cold, or one where all hot values are only in hot. Mixing them will lead to a lot of complications.
Say we only were tiering in L6, which is the latest proposal. And we are compacting a 64MiB L5 file with some number of L6 files that sum to 640MiB. Say this compaction is not one that rewrites existing references. And some of those input L6 files were moved down from higher levels so don't have cold references.
Are we going to force writing of cold blob files? If yes, does this actually help with any invariant on L6 files given move compacted L6 files may only have hot references? Also, what if the L5 file happened to have only tiny values that don't get moved to a blob file so the cold blob file is also tiny -- if we don't write cold blob files yet, we could wait till we can write a bulkier cold blob file.In my mind at least, the idea to have both cold and hot blob references for the same data was something to improve the average behavior of the system by bulking up the cold blob files and not an invariant that we needed to rely on. We already had fine grained
KindAndTierin the prototype, and I don't think it complicated anything.IMO upper level files that aren't tiered wouldn't have this data at all. The cost-benefit is not there - these are normally
~1%of total size so statistics are not very valuable, but they get written much more frequently. At the same time, a single upper-level file can have keys from many different tables and we'd have to look up all each table's span config in every compaction.I really don't have an opinion on this.
Just one thing to note: upper level files were not even split based on TieringSpanID (let alone shard/range boundaries), in the prototype, since that can lead to smaller files than necessary. So one could not estimate how much data would have cold references written purely based on L5 sstable stats since there could be values in there with different tiering-span-ids (and some with no tiering policy). Just maintaining the same behavior across all files lead to less special casing of logic (which has its own simplicity). The sizes of these histograms don't really matter except for compactions, and I suspect we are a long way away from having 1000s of different tiered SQL tables in a cluster.Anyway, I am fine with whatever y'all decide.
Thanks, good points. And in any case, it's better to make less assumptions here so we have more freedom to play with the higher-level policies. I retract my comment and should keep both SSTableBlobReferenceHotBytes and SSTableBlobReferenceColdBytes which counts the total value size of hot and for cold references (sorry @annrpom :) )
We'll leave the other question of whether upper-level files have tiering information open for now.
RaduBerinde
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RaduBerinde made 7 comments.
Reviewable status: all files reviewed, 13 unresolved discussions (waiting on @annrpom, @jbowens, and @sumeerbhola).
sstable/tieredmeta/histogram.go line 24 at r1 (raw file):
Previously, annrpom (annie pompa) wrote…
done
[nit] value
sstable/tieredmeta/histogram.go line 43 at r2 (raw file):
// digest is the t-digest for non-zero attributes (read-only). // May be nil if no non-zero samples have been recorded. digest *tdigest.TDigest
Are there any situations where we would want this to be nil? If not, we should make it a non-pointer
sstable/tieredmeta/histogram_block.go line 76 at r2 (raw file):
) // TieringHistogramBlockWriter is instantiated for each sstable or blob file
[nit] let's just say is instantiated for each output file that requires tiering histograms.
sstable/tieredmeta/histogram_block.go line 111 at r2 (raw file):
} func NewTieringHistogramBlockWriter() *TieringHistogramBlockWriter {
[nit] MakeTieringHistogramBlockWriter() TieringHistogramBlockWriter (too many pointers)
sstable/tieredmeta/histogram_block.go line 130 at r2 (raw file):
// Flush returns the encoded block contents. func (w *TieringHistogramBlockWriter) Flush() []byte {
[nit] Finish(). Should mention if the writer can be reused after that.
sstable/tieredmeta/histogram_block.go line 150 at r2 (raw file):
// TieringHistogramBlockContents is the in-memory contents of the tiering // histogram block. We may temporarily load it into TableMetadata for
[nit] the comment is likely to become stale, not sure if we'll ever load it into TableMetadata. I'd remove the second sentence. We could also just return map[Key]StatsHistogram since it's pretty self-explanatory.
1cd44c2 to
3cfbbe3
Compare
annrpom
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@annrpom made 6 comments.
Reviewable status: all files reviewed, 13 unresolved discussions (waiting on @jbowens, @RaduBerinde, and @sumeerbhola).
sstable/tieredmeta/histogram.go line 24 at r1 (raw file):
Previously, RaduBerinde wrote…
[nit] value
done
sstable/tieredmeta/histogram.go line 34 at r2 (raw file):
Previously, sumeerbhola wrote…
should this say BytesNoAttr instead of ZeroBytes?
yes - ty for the catch; done
sstable/tieredmeta/histogram.go line 56 at r2 (raw file):
Previously, sumeerbhola wrote…
worth saying the quantile is of the non-zero bytes.
done
sstable/tieredmeta/histogram.go line 146 at r2 (raw file):
Previously, sumeerbhola wrote…
is it ok to have
digestSize < len(buf)or should that also be an error?
that should be an error; done
sstable/tieredmeta/histogram_block.go line 90 at r1 (raw file):
Previously, RaduBerinde wrote…
I forgot about the in-place values. Indeed we would need a size that excludes that.
And we also computed the sum of the key columns in all the data blocks in the sstable (say B)
This would be the size of the encoded column as it appears in the block? That does make sense. Though I'm not sure if scaling according to the sum of key sizes is necessarily better than just by key count.
With the key count, we could also estimate the amount of other per-key data in the block (the tiering attribute, the blob reference).
I am suspecting that in many cases, the extra key storage itself might only be a few bytes per key (e.g. last few bytes of an ID), much less than the other per-key data in the sstable.
let me make a separate PR for key count
sstable/tieredmeta/histogram_block.go line 72 at r2 (raw file):
Previously, RaduBerinde wrote…
Thanks, good points. And in any case, it's better to make less assumptions here so we have more freedom to play with the higher-level policies. I retract my comment and should keep both
SSTableBlobReferenceHotBytesandSSTableBlobReferenceColdByteswhich counts the total value size of hot and for cold references (sorry @annrpom :) )We'll leave the other question of whether upper-level files have tiering information open for now.
np! done
Potential Bug(s) DetectedThe three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation. Next Steps: Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary. After you review the findings, please tag the issue as follows:
|
Add StatsHistogram, a quantile sketch backed by t-digest, to track byte distribution by TieringAttribute. The histogram supports CDF and quantile queries to estimate bytes above/below thresholds for tiering decisions. Add TieringHistogramBlockWriter to encode multiple histograms per sstable or blob file, keyed by (KindAndTier, SpanID). The block will support histogram types for sstable keys, inline values, and blob references (hot/cold).
3cfbbe3 to
e04d365
Compare
sumeerbhola
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sumeerbhola reviewed 3 files and made 2 comments.
Reviewable status: all files reviewed (commit messages unreviewed), 13 unresolved discussions (waiting on @annrpom, @jbowens, and @RaduBerinde).
sstable/tieredmeta/histogram_block.go line 72 at r2 (raw file):
Value that has both cold reference and hot (in place in sstable, or reference): Useful for observability
We don't have this category. And there is a code comment now that says:
// SSTableKeyBytes + SSTableInPlaceValueBytes + SSTableBlobReferenceHotBytes +
// SSTableBlobReferenceColdBytes
Won't this double count without having that as a category?
RaduBerinde
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RaduBerinde made 1 comment.
Reviewable status: all files reviewed (commit messages unreviewed), 13 unresolved discussions (waiting on @annrpom, @jbowens, and @sumeerbhola).
sstable/tieredmeta/histogram_block.go line 72 at r2 (raw file):
Previously, sumeerbhola wrote…
Value that has both cold reference and hot (in place in sstable, or reference): Useful for observability
We don't have this category. And there is a code comment now that says:
// SSTableKeyBytes + SSTableInPlaceValueBytes + SSTableBlobReferenceHotBytes + // SSTableBlobReferenceColdBytesWon't this double count without having that as a category?
Hm, yes. I'm worried about overhead of updating too many histograms with every KV.
-
We could remove
SSTableKeyBytesand replace it with just a per-sstable percentage of data bytes that are due to keys that have tiering attribute below the threshold at the time of compaction? I don't see what the full histogram is buying us. -
Do we need to distinguish in-place values vs hot blob references? If not, we could have
SSTableHotOnlyValueBytes,SSTableColdOnlyValueBytesandSSTableHotAndColdValueByteswhere each KV updates only one of the three.
Add StatsHistogram, a quantile sketch backed by t-digest, to track byte distribution by TieringAttribute. The histogram supports CDF and quantile queries to estimate bytes above/below thresholds for tiering decisions.
Add TieringHistogramBlockWriter to encode multiple histograms per sstable or blob file, keyed by (KindAndTier, SpanID). The block will support histogram types for sstable keys, inline values, and blob references (hot/cold).