Skip to content

Add DictionaryId that can be used to uniquely identify dictionaries, and use this in the aggregate HT to cache look-ups#15196

Merged
Mytherin merged 1 commit into
duckdb:mainfrom
Mytherin:dictionaryid
Dec 9, 2024
Merged

Add DictionaryId that can be used to uniquely identify dictionaries, and use this in the aggregate HT to cache look-ups#15196
Mytherin merged 1 commit into
duckdb:mainfrom
Mytherin:dictionaryid

Conversation

@Mytherin
Copy link
Copy Markdown
Collaborator

@Mytherin Mytherin commented Dec 8, 2024

Follow-up from #15152

This PR enables the caching of results computed for dictionaries across vectors by (optionally) assigning them with a unique identifier. The identifier is a string that needs to be unique so we can distinguish between a vector having the same dictionary or a different dictionary as the previous vector. It does not have to be globally unique (i.e. really only in the same pipeline for the same thread).

  • For our own storage, the dictionary id is set to the ColumnSegment pointer address. Since we never de-allocate ColumnSegments while scanning this works
  • For Parquet dictionaries, the dictionary id is set to [parquet_filename]_[column_name]_[byte_offset]

…and use this in the aggregate HT to cache look-ups
@Mytherin Mytherin merged commit 2be6f64 into duckdb:main Dec 9, 2024
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request Dec 27, 2024
Add DictionaryId that can be used to uniquely identify dictionaries, and use this in the aggregate HT to cache look-ups (duckdb/duckdb#15196)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request Dec 27, 2024
Add DictionaryId that can be used to uniquely identify dictionaries, and use this in the aggregate HT to cache look-ups (duckdb/duckdb#15196)
github-actions Bot pushed a commit to duckdb/duckdb-r that referenced this pull request Dec 28, 2024
Add DictionaryId that can be used to uniquely identify dictionaries, and use this in the aggregate HT to cache look-ups (duckdb/duckdb#15196)
github-actions Bot added a commit to duckdb/duckdb-r that referenced this pull request Dec 28, 2024
Add DictionaryId that can be used to uniquely identify dictionaries, and use this in the aggregate HT to cache look-ups (duckdb/duckdb#15196)

Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>
@Mytherin Mytherin deleted the dictionaryid branch January 16, 2025 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants