Autoloading helper file system: allow either autoloading or proper errors in more file operations#19198
Conversation
samansmink
left a comment
There was a problem hiding this comment.
Looks good, I would say expected behaviour should just be:
VirtualFileSystem::FindFileSystem(const string &path):
- if path matches existing filesystem -> return that
- try autoloading relevant extension and retry, if matches -> return that
- else return localfilesystem
|
Note to self: this should guarantee (given on failure to load this throws) that each query either a LOAD is attempted and failed (but that is notified) or an attempt worked, that reduces the space of available prefixes. One question I am not sure is whether there are ever problems with INSTALL / LOAD in parallel, that this will stress test more. Should there be a lock preventing multiple threads to autoload side-by-side? |
|
Regression failure is expected, to be solved changing link in #19206 |
Mytherin
left a comment
There was a problem hiding this comment.
Thanks for the PR! Looks great - minor comments
effb100 to
e7978ec
Compare
|
On my side this is good to go |
|
Thanks! |
Autoloading helper file system: allow either autoloading or proper errors in more file operations (duckdb/duckdb#19198) Bump: delta (duckdb/duckdb#19220) Fixup templated version of TryGetSecretKeyOrSetting (duckdb/duckdb#19218) build spatial extension for mingw (duckdb/duckdb#19207)
Autoloading helper file system: allow either autoloading or proper errors in more file operations (duckdb/duckdb#19198) Bump: delta (duckdb/duckdb#19220) Fixup templated version of TryGetSecretKeyOrSetting (duckdb/duckdb#19218) build spatial extension for mingw (duckdb/duckdb#19207) Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>
Currently, if
httpfsextension has not been loaded yet:either fails like:
where
s3://my_bucket/my_file.parquetis treated as a local file system and can't find the base directorys3://my_bucketthat is necessary to actually save the file.will rightly fail like:
that, given I choose a random bucket name that I have no permissions for is appropriate.
Goal of this PR is making autoloading (or detection that loading an extension is needed) work in more broader contest, in particular:
SELECT name, id FROM ICEBERG_SCAN('https://raw.githubusercontent.com/duckdb/duckdb-iceberg/main/data/persistent/equality_deletes/warehouse/mydb/mytable', VERSION='1', allow_moved_paths = true);will now correctly trigger autoloading of httpfs)This infrastructure is also easy to generalize, it's a matter of calling
VirtualFileSystem::FindFileSystem(path)overload that takes either a ClientContext or a FileOpener:VirtualFileSystem::FindFileSystem(path, context_or_file_opener).After this PR, same example as above would either autoload and throw HTTP Error:
or throw a Missing Extension Error:
Fixes #19180
Fixes autoloading of FileExist, example: