Skip to content

Conversation

@carlopi
Copy link
Contributor

@carlopi carlopi commented Sep 30, 2025

Currently, if httpfs extension has not been loaded yet:

COPY (SELECT 42 AS answer) TO 's3://my_bucket/my_file.parquet';

either fails like:

IO Error:
Cannot open file "s3://my_bucket/my_file.parquet": No such file or directory

where s3://my_bucket/my_file.parquet is treated as a local file system and can't find the base directory s3://my_bucket that is necessary to actually save the file.

LOAD httpfs;
COPY (SELECT 42 AS answer) TO 's3://my_bucket/my_file.parquet';

will rightly fail like:

HTTP Error:
Unable to connect to URL https://my_bucket.s3.us-east-1.amazonaws.com/my_file.parquet?uploads=: Forbidden (HTTP code 403)

that, given I choose a random bucket name that I have no permissions for is appropriate.

Goal of this PR is making autoloading (or detection that loading an extension is needed) work in more broader contest, in particular:

  • for targets of COPY TO
  • for globs (this worked already, but now with the same infrastructure)
  • for fetching single files in a non-globbing situation (example: SELECT name, id FROM ICEBERG_SCAN('https://raw.githubusercontent.com/duckdb/duckdb-iceberg/main/data/persistent/equality_deletes/warehouse/mydb/mytable', VERSION='1', allow_moved_paths = true); will now correctly trigger autoloading of httpfs)

This infrastructure is also easy to generalize, it's a matter of calling VirtualFileSystem::FindFileSystem(path) overload that takes either a ClientContext or a FileOpener: VirtualFileSystem::FindFileSystem(path, context_or_file_opener).

After this PR, same example as above would either autoload and throw HTTP Error:

HTTP Error:
Unable to connect to URL https://my_bucket.s3.us-east-1.amazonaws.com/my_file.parquet?uploads=: Forbidden (HTTP code 403)

or throw a Missing Extension Error:

Missing Extension Error:
File s3://my_bucket/my_file.parquet requires the extension httpfs to be loaded

Please try installing and loading the httpfs extension by running:
INSTALL httpfs;
LOAD httpfs;

Alternatively, consider enabling auto-install and auto-load by running:
SET autoinstall_known_extensions=1;
SET autoload_known_extensions=1;

Fixes #19180
Fixes autoloading of FileExist, example:

SELECT name, id FROM ICEBERG_SCAN('https://raw.githubusercontent.com/duckdb/duckdb-iceberg/main/data/persistent/equality_deletes/warehouse/mydb/mytable', VERSION='2', allow_moved_paths = true);

@duckdb-draftbot duckdb-draftbot marked this pull request as draft September 30, 2025 09:38
@carlopi carlopi marked this pull request as ready for review September 30, 2025 09:40
@duckdb-draftbot duckdb-draftbot marked this pull request as draft September 30, 2025 10:43
@carlopi carlopi marked this pull request as ready for review September 30, 2025 10:45
@carlopi carlopi requested a review from samansmink September 30, 2025 11:59
Copy link
Collaborator

@samansmink samansmink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I would say expected behaviour should just be:

VirtualFileSystem::FindFileSystem(const string &path):

  • if path matches existing filesystem -> return that
  • try autoloading relevant extension and retry, if matches -> return that
  • else return localfilesystem

@duckdb-draftbot duckdb-draftbot marked this pull request as draft September 30, 2025 12:35
@carlopi
Copy link
Contributor Author

carlopi commented Sep 30, 2025

Note to self: this should guarantee (given on failure to load this throws) that each query either a LOAD is attempted and failed (but that is notified) or an attempt worked, that reduces the space of available prefixes.

One question I am not sure is whether there are ever problems with INSTALL / LOAD in parallel, that this will stress test more. Should there be a lock preventing multiple threads to autoload side-by-side?

@carlopi carlopi marked this pull request as ready for review September 30, 2025 14:36
@carlopi carlopi requested a review from samansmink September 30, 2025 14:48
@carlopi
Copy link
Contributor Author

carlopi commented Sep 30, 2025

Regression failure is expected, to be solved changing link in #19206

@duckdb-draftbot duckdb-draftbot marked this pull request as draft September 30, 2025 19:44
@carlopi carlopi marked this pull request as ready for review September 30, 2025 19:44
@carlopi carlopi marked this pull request as draft October 1, 2025 09:21
Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Looks great - minor comments

@carlopi carlopi marked this pull request as ready for review October 1, 2025 11:32
@carlopi carlopi marked this pull request as draft October 1, 2025 11:43
@carlopi
Copy link
Contributor Author

carlopi commented Oct 2, 2025

On my side this is good to go

@Mytherin Mytherin merged commit 40ceab2 into duckdb:v1.4-andium Oct 2, 2025
53 checks passed
@Mytherin
Copy link
Collaborator

Mytherin commented Oct 2, 2025

Thanks!

@carlopi carlopi deleted the autoloading_file_system branch October 2, 2025 07:37
github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request Oct 2, 2025
Autoloading helper file system: allow either autoloading or proper errors in more file operations (duckdb/duckdb#19198)
Bump: delta (duckdb/duckdb#19220)
Fixup templated version of TryGetSecretKeyOrSetting (duckdb/duckdb#19218)
build spatial extension for mingw (duckdb/duckdb#19207)
github-actions bot added a commit to duckdb/duckdb-r that referenced this pull request Oct 2, 2025
Autoloading helper file system: allow either autoloading or proper errors in more file operations (duckdb/duckdb#19198)
Bump: delta (duckdb/duckdb#19220)
Fixup templated version of TryGetSecretKeyOrSetting (duckdb/duckdb#19218)
build spatial extension for mingw (duckdb/duckdb#19207)

Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants