-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
Expose os.DirEntry
objects from pathlib
#125413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Add a `Path.dir_entry` attribute. In any path object generated by `Path.iterdir()`, it stores an `os.DirEntry` object corresponding to the path; in other cases it is `None`. This can be used to retrieve the file type and attributes of directory children without necessarily incurring further system calls. Under the hood, we use `dir_entry` in our implementations of `PathBase.glob()`, `PathBase.walk()` and `PathBase.copy()`, the last of which also provides the implementation of `Path.copy()`, resulting in a modest speedup when copying local directory trees.
Add a `Path.dir_entry` attribute. In any path object generated by `Path.iterdir()`, it stores an `os.DirEntry` object corresponding to the path; in other cases it is `None`. This can be used to retrieve the file type and attributes of directory children without necessarily incurring further system calls. Under the hood, we use `dir_entry` in our implementations of `PathBase.glob()`, `PathBase.walk()` and `PathBase.copy()`, the last of which also provides the implementation of `Path.copy()`, resulting in a modest speedup when copying local directory trees.
I put this feedback on the PR, but it's probably better placed here: while I like the general idea, I don't think this specific API is the right way to do it.
I think we can eliminate both of those bits of awkwardness:
If it's impractical to add |
… once Improve `pathlib._abc.PathBase.copy()` (which provides `Path.copy()`) by fetching operands' supported metadata keys up-front, rather than once for each path in the tree. This prepares the way for using `os.DirEntry` objects in `copy()`.
pathlib.Path.dir_entry
os.DirEntry
objects from pathlib
Add `pathlib.Path.scandir()` as a trivial wrapper of `os.scandir()`. In the private `pathlib._abc.PathBase` class, we can rework the `iterdir()`, `glob()`, `walk()` and `copy()` methods to call `scandir()` and make use of cached directory entry information, and thereby improve performance. Because the `Path.copy()` method is provided by `PathBase`, this also speeds up traversal when copying local files and directories.
Use the new `PathBase.scandir()` method in `PathBase.glob()`, which greatly reduces the number of `PathBase.stat()` calls needed when globbing. There are no user-facing changes, because the pathlib ABCs are still private and `Path.glob()` doesn't use the implementation in its superclass.
To tie up the above loose ends, we went with a |
Use the new `PathBase.scandir()` method in `PathBase.walk()`, which greatly reduces the number of `PathBase.stat()` calls needed when walking. There are no user-facing changes, because the pathlib ABCs are still private and `Path.walk()` doesn't use the implementation in its superclass.
Use the new `PathBase.scandir()` method in `PathBase.copy()`, which greatly reduces the number of `PathBase.stat()` calls needed when copying. This also speeds up `Path.copy()`, which inherits the superclass implementation. Under the hood, we use directory entries to distinguish between files, directories and symlinks, and to retrieve a `stat_result` when reading metadata. This logic is extracted into a new `pathlib._abc.CopierBase` class, which helps reduce the number of underscore-prefixed support methods in the path interface.
Use the new `PathBase.scandir()` method in `PathBase.copy()`, which greatly reduces the number of `PathBase.stat()` calls needed when copying. This also speeds up `Path.copy()`, which inherits the superclass implementation. Under the hood, we use directory entries to distinguish between files, directories and symlinks, and to retrieve a `stat_result` when reading metadata. This logic is extracted into a new `pathlib._abc.CopierBase` class, which helps reduce the number of underscore-prefixed support methods in the path interface.
Use the new `PathBase.scandir()` method in `PathBase.glob()`, which greatly reduces the number of `PathBase.stat()` calls needed when globbing. There are no user-facing changes, because the pathlib ABCs are still private and `Path.glob()` doesn't use the implementation in its superclass.
Use the new `PathBase.scandir()` method in `PathBase.walk()`, which greatly reduces the number of `PathBase.stat()` calls needed when walking. There are no user-facing changes, because the pathlib ABCs are still private and `Path.walk()` doesn't use the implementation in its superclass.
Remove documentation for `pathlib.Path.scandir()`, and rename the method to `_scandir()`. In the private pathlib ABCs, make `iterdir()` abstract and call it from `_scandir()`. It's not worthwhile to add this method at the moment - see discussion: https://discuss.python.org/t/ergonomics-of-new-pathlib-path-scandir/71721
Remove documentation for `pathlib.Path.scandir()`, and rename the method to `_scandir()`. In the private pathlib ABCs, make `iterdir()` abstract and call it from `_scandir()`. It's not worthwhile to add this method at the moment - see discussion: https://discuss.python.org/t/ergonomics-of-new-pathlib-path-scandir/71721 Co-authored-by: Steve Dower <[email protected]>
…blePath` In the private pathlib ABCs, support write-only virtual filesystems by making `WritablePath` inherit directly from `JoinablePath`, rather than subclassing `ReadablePath`. There are two complications: - `ReadablePath.open()` applies to both reading and writing - `ReadablePath.copy` is secretly an object that supports the *read* side of copying, whereas `WritablePath.copy` is a different kind of object supporting the *write* side We untangle these as follow: - A new `pathlib._abc.magic_open()` function replaces the `open()` method, which is dropped from the ABCs but remains in `pathlib.Path`. The function works like `io.open()`, but additionally accepts objects with `__open_rb__()` or `__open_wb__()` methods as appropriate for the mode. These new dunders are made abstract methods of `ReadablePath` and `WritablePath` respectively. If the pathlib ABCs are made public, we could consider blessing an "openable" protocol and supporting it in `io.open()`, removing the need for `pathlib._abc.magic_open()`. - `ReadablePath.copy` becomes a true method, whereas `WritablePath.copy` is deleted. A new `ReadablePath._copy_reader` property provides a `CopyReader` object, and similarly `WritablePath._copy_writer` is a `CopyWriter` object. Once pythonGH-125413 is resolved, we'll be able to move the `CopyReader` functionality into `ReadablePath.info` and eliminate `ReadablePath._copy_reader`.
…h` (#129014) In the private pathlib ABCs, support write-only virtual filesystems by making `WritablePath` inherit directly from `JoinablePath`, rather than subclassing `ReadablePath`. There are two complications: - `ReadablePath.open()` applies to both reading and writing - `ReadablePath.copy` is secretly an object that supports the *read* side of copying, whereas `WritablePath.copy` is a different kind of object supporting the *write* side We untangle these as follow: - A new `pathlib._abc.magic_open()` function replaces the `open()` method, which is dropped from the ABCs but remains in `pathlib.Path`. The function works like `io.open()`, but additionally accepts objects with `__open_rb__()` or `__open_wb__()` methods as appropriate for the mode. These new dunders are made abstract methods of `ReadablePath` and `WritablePath` respectively. If the pathlib ABCs are made public, we could consider blessing an "openable" protocol and supporting it in `io.open()`, removing the need for `pathlib._abc.magic_open()`. - `ReadablePath.copy` becomes a true method, whereas `WritablePath.copy` is deleted. A new `ReadablePath._copy_reader` property provides a `CopyReader` object, and similarly `WritablePath._copy_writer` is a `CopyWriter` object. Once GH-125413 is resolved, we'll be able to move the `CopyReader` functionality into `ReadablePath.info` and eliminate `ReadablePath._copy_reader`.
Add `pathlib.Path.info` attribute, which stores an object implementing the `pathlib.types.PathInfo` protocol (also new). The object supports querying the file type and internally caching `os.stat()` results. Path objects generated by `Path.iterdir()` are initialised with status information from `os.DirEntry` objects, which is gleaned from scanning the parent directory. The `PathInfo` protocol has four methods: `exists()`, `is_dir()`, `is_file()` and `is_symlink()`.
…`Path.info` Move pathlib's private `CopyReader`, `LocalCopyReader`, `CopyWriter` and `LocalCopyWriter` classes into `pathlib._os`, where they can live alongside the low-level copying functions (`copyfileobj()` etc) and high-level path querying interface (`PathInfo`). This sets the stage for merging `LocalCopyReader` into `PathInfo`.
…info` (#129856) Move pathlib's private `CopyReader`, `LocalCopyReader`, `CopyWriter` and `LocalCopyWriter` classes into `pathlib._os`, where they can live alongside the low-level copying functions (`copyfileobj()` etc) and high-level path querying interface (`PathInfo`). This sets the stage for merging `LocalCopyReader` into `PathInfo`. No change of behaviour; just moving some code around.
Add the following private methods to `pathlib.Path.info`: - `_get_mode()`: returns the POSIX file mode (`st_mode`), or zero if `os.stat()` fails. - `_get_times_ns()`: returns the access and modify times in nanoseconds (`st_atime_ns` and `st_mtime_ns`), or zeroes if `os.stat()` fails. - `_get_flags()`: returns the BSD file flags (`st_flags`), or zero if `os.stat()` fails. - `_get_xattrs()`: returns the file extended attributes as a list of key, value pairs, or an empty list if `listxattr()` or `getattr()` fail. These methods replace `LocalCopyReader.read_metadata()`, and so we can delete the `CopyReader` and `LocalCopyReader` classes. Rather than reading metadata via `source._copy_reader.read_metadata()`, we instead call `source.info._get_mode()`, `_get_times_ns()`, etc. Copying metadata is only supported for local-to-local copies at the moment. To support copying between arbitrary `ReadablePath` and `WritablePath` objects, we'd need to make the new methods public and documented.
Add the following private methods to `pathlib.Path.info`: - `_posix_permissions()`: the POSIX file permissions (`S_IMODE(st_mode)`) - `_file_id()`: the file ID (`(st_dev, st_ino)`) - `_access_time_ns()`: the access time in nanoseconds (`st_atime_ns`) - `_mod_time_ns()`: the modify time in nanoseconds (`st_mtime_ns`) - `_bsd_flags()`: the BSD file flags (`st_flags`) - `_xattrs()`: the file extended attributes as a list of key, value pairs, or an empty list if `listxattr()` or `getxattr()` fail in an ignorable way. These methods replace `LocalCopyReader.read_metadata()`, and so we can delete the `CopyReader` and `LocalCopyReader` classes. Rather than reading metadata via `source._copy_reader.read_metadata()`, we instead call `source.info._posix_permissions()`, `_access_time_ns()`, etc. Preserving metadata is only supported for local-to-local copies at the moment. To support copying metadata between arbitrary `ReadablePath` and `WritablePath` objects, we'd need to make the new methods public and documented. Co-authored-by: Petr Viktorin <[email protected]>
…globbing Call `ReadablePath.info.exists()` rather than `ReadablePath.exists()` when globbing so that we use (or populate) the `info` cache.
…ove()` In `pathlib.Path.copy()` and `move()`, return a fresh `Path` object with an unpopulated `info` attribute, rather than a `Path` object with information recorded *prior* to the path's creation.
…ove()` In `pathlib.Path.copy()` and `move()`, return a fresh `Path` object with an unpopulated `info` attribute, rather than a `Path` object with information recorded *prior* to the path's creation.
…ng (#130422) Call `ReadablePath.info.exists()` rather than `ReadablePath.exists()` when globbing so that we use (or populate) the `info` cache.
…#130424) In `pathlib.Path.copy()` and `move()`, return a fresh `Path` object with an unpopulated `info` attribute, rather than a `Path` object with information recorded *prior* to the path's creation.
Replace `WritablePath._copy_writer` with a new `_write_info()` method. This method allows the target of a `copy()` to preserve metadata. Replace `pathlib._os.CopyWriter` and `LocalCopyWriter` classes with new `copy_file()` and `copy_info()` functions. The `copy_file()` function uses `source_path.info` wherever possible to save on `stat()`s.
…globbing (python#130422) Call `ReadablePath.info.exists()` rather than `ReadablePath.exists()` when globbing so that we use (or populate) the `info` cache.
…ove()` (python#130424) In `pathlib.Path.copy()` and `move()`, return a fresh `Path` object with an unpopulated `info` attribute, rather than a `Path` object with information recorded *prior* to the path's creation.
…ython#130238) Replace `WritablePath._copy_writer` with a new `_write_info()` method. This method allows the target of a `copy()` to preserve metadata. Replace `pathlib._os.CopyWriter` and `LocalCopyWriter` classes with new `copy_file()` and `copy_info()` functions. The `copy_file()` function uses `source_path.info` wherever possible to save on `stat()`s.
Feature or enhancement
I propose we add a new
Path.status
attribute that stores anos.DirEntry
object in paths yielded fromPath.iterdir()
, or a pathlib-specific type with a similar interface in other paths.This would:
os.DirEntry
after callingPath.iterdir()
, which is useful for efficiently determining files' types and often doesn't involve a system call.S_ISREG(st.st_mode)
and other holy incantations.PathBase.stat()
and thestat_result
interface, which is too low-level and local filesystem-specificSee discussion: https://discuss.python.org/t/is-there-a-pathlib-equivalent-of-os-scandir/46626
Linked PRs
pathlib.Path.dir_entry
attribute #125419pathlib.Path.copy()
: get common metadata keys only once #125990pathlib.Path.scandir()
method #126060scandir()
to speed upglob()
#126261scandir()
to speed upwalk()
#126262scandir()
to speed upcopy()
#126263pathlib.Path.scandir()
method #127377pathlib.Path.info
attribute #127730pathlib.Path.copy()
implementation alongsidePath.info
#129856pathlib.Path.info
#129897pathlib.Path
method to write metadata #130238path.info.exists()
when globbing #130422pathlib.Path.copy()
andmove()
#130424The text was updated successfully, but these errors were encountered: