Skip to content

Conversation

@Tishj
Copy link
Contributor

@Tishj Tishj commented Feb 7, 2025

This PR fixes #16094

First this was using global_columns, this list of columns is what the Reader is aware of, in this case the Parquet reader.
This list is influenced by the schema parameter.

global_column_ids comes from the TableFunctionInitInput, and will also contain artificial/generated columns like "filename"

@Tishj Tishj requested a review from samansmink February 7, 2025 11:32
@Mytherin Mytherin changed the base branch from main to v1.2-histrionicus February 7, 2025 11:38
@Mytherin
Copy link
Collaborator

Mytherin commented Feb 7, 2025

Thanks! Can you rebase this to v1.2-histrionicus so we can push it out for v1.2.1?

@Mytherin Mytherin changed the base branch from v1.2-histrionicus to main February 7, 2025 11:39
@marcoslot
Copy link
Contributor

Thanks, we found the same fix, though still get the error when filtering a Delta table by filename:

git clone https://github.com/delta-io/delta-examples.git
select * from delta_scan('delta-examples/data/people_countries_delta_dask/', filename = True) where filename = 'delta-examples/data/people_countries_delta_dask/country=Argentina/part-00000-8d0390a3-f797-4265-b9c2-da1c941680a3.c000.snappy.parquet';

though that might be a separate issue

@samansmink
Copy link
Collaborator

@marcoslot I can confirm that that issue still persists after this fix. I will take a look

@samansmink
Copy link
Collaborator

@marcoslot's issue is also present in DuckDB v1.1.3 meaning that it is separate issue. I will open an issue in the duckdb_delta repo for it

Copy link
Collaborator

@samansmink samansmink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from the branch to merge into

@Tishj Tishj changed the base branch from main to v1.2-histrionicus February 7, 2025 12:05
@Tishj Tishj force-pushed the multi_file_reader_create_filter_map_fix branch from 23062e2 to 89daa84 Compare February 7, 2025 12:08
@duckdb-draftbot duckdb-draftbot marked this pull request as draft February 7, 2025 12:09
@Tishj Tishj marked this pull request as ready for review February 7, 2025 12:10
@Mytherin Mytherin merged commit f7637a9 into duckdb:v1.2-histrionicus Feb 7, 2025
50 checks passed
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request Mar 7, 2025
[Dev] MultiFileReader fix InternalError in CreateFilterMap (duckdb/duckdb#16114)
@Tishj Tishj deleted the multi_file_reader_create_filter_map_fix branch November 7, 2025 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Internal error in 1.2.0 when combining schema, filename and filter in read_parquet

4 participants