Skip to content

Document cross filesystem compatible definition of "/", ".", "", ".." #498

@ap--

Description

@ap--

Originally posted by @cscutcher in #494

For me the big selling point of the library is accepting UPaths but being completely agnostic about what backend is in use. It's probably not possible to be 100% agnostic, but I think a good starting place would be if there were at least clear definitions documented for the meaning of UPath(""), UPath("/"), UPath(".") and UPath("..") .

All 4 examples you provide will return PosixUPath and WindowsUPath instances depending on your operating system. This is because the provided first argument is a non-uri-like path and the protocol keyword parameter is unset.
Both PosixUPath and WindowsUPath can basically be thought of as a pathlib.Path subclass with the additional attributes/methods that UPath provides.

In all four cases, before I started thinking about it in the context of UPath, I would have naively said I had a clear understanding of what those paths mean. However, once I started thinking about specifics, especially in the case of UPath's filesystem agnostic approach, I realised I was a bit clueless!

For context, in case it's helpful in your design considerations, I was using the memory backend primarily for testing, so in my case I really wanted behaviour to be as similar as possible to the local filesystem. In the end I awkwardly subclassed MemoryFileSystem and MemoryPath so I work around this issue, but also to implement symlink support which I believe is missing in MemoryFileSystem. I imagine it's possible that my choice of using MemoryFileSystem as a mock local filesystem, goes against the original intent for it, so maybe I was doomed from the start!

The fsspec MemoryFileSystem is indeed most commonly used as a testing filesystem. So your intuition was right here. In general I would avoid symlinks if you want cross-filesystem compatible interactions.

This is because on object store and many of the other filesystems symlinks don't exist. On some like http filesystems for example you could interpret redirects as symlinks, but if you go into the details it's non-trivial.

Another small comment, but how to handle relative paths in general seems an interesting challenge. I'm sure there are good reasons why this isn't the case, but it seems to me that relative paths shouldn't necessarily be tied to any protocol or backend. I can see why making UPath("foo/bar") implicitly a path relative to cwd on the local file system, would be necessary to make .open etc work as a user might expect, but it would be nice to be able to have an explicitly relative path.
To me, in a backend agnostic world, a path like foo/bar should only get tied to a specific backend when it's combined with some absolute path object, but on it's own it only states "the subdirectory bar, which is the subdirectory of foo" which should be possible to apply to any filesystem backend equally, if that makes sense.

Unfortunately, relative paths can't fully be decoupled from their filesystem implementations. This all stems from the fact, that (1) fsspec paths are always absolute and (2) they actually have no strict definition of what these paths can be. So a relative path foo/../bar, or foo//bar would mean something different on a local filesystem, vs an s3 bucket.

All that being said, you have a few options to get what you want:

  1. make a relative UPath: (while not supported directly from the constructor, you can make one via relative_to)
    >>> from upath import UPath
    >>> UPath("s3://bucket/foo/bar").relative_to(UPath("s3://bucket/"))
    <relative S3Path 'foo/bar'>
  2. consistently use resolve() before file access to ensure . and .. are handled in a pathlib like interpretation:
    >>> from upath import UPath
    >>> UPath("bucket/foo/bar", protocol="s3").joinpath("../abc").resolve()
    S3Path('bucket/foo/abc', protocol='s3')
  3. in internal projects that require loads of path traversals, I usually tend to define all relative locations as PurePosixPath instances, and allow to provide a base UPath to determine the root of the absolute filesystem location.

Metadata

Metadata

Assignees

No one assigned

    Labels

    compatibility 🤝Compatibility with stdlib pathlibdocumentation 📘Improvements or additions to documentation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions