Skip to content

proposal: os: make Readdir return lazy FileInfo implementations #41188

Closed
@rsc

Description

@rsc

An os.File provides two ways to read a directory: Readdirnames returns a list of the names of the directory entries, and Readdir returns the names along with stat information.

On Plan 9 and Windows, Readdir can be implemented with only a directory read - the directory read operation provides the full stat information.

But many Go users use Unix systems.

On most Unix systems, the directory read does not provide full stat information. So the implementation of Readdir reads the names from the directory and then calls Lstat for each file. This is fairly expensive.

Much of the time, such as in the implementation of filepath.Glob and other file system walking, the only information the caller of Readdir really needs is the name and whether the name denotes a directory. On most Unix systems, that single bit of information—is this name a directory?—is available from the plain directory read, without an additional stat. If the caller is only using that bit, the extra Lstat calls are unnecessary and slow. (Goimports, for example, has its own directory walker to avoid this cost.)

Various people have proposed adding a third directory reading option of one form or another, to get names and IsDir bits. This would certainly address the slow directory walk issue on Unix systems, but it seems like overfitting to Unix.

Note that os.FileInfo is an interface. What if we make Readdir return a slice of lazily-filled os.FileInfo? That is, on Unix, Readdir would stop calling Lstat. Each returned FileInfo would already know the answer for its Name and IsDir methods. The first call to any of the other methods would incur an Lstat at that moment to find out the rest of the information. A directory walk that uses Readdir and then only calls Name and IsDir would have all its Lstat calls optimized away with no code changes in the caller.

The downside of this is that the laziness would be visible when you do the Readdir and wait a while before looking at the results. For example if you did Readdir, then touched one of the files in the list, then called the ModTime method on the os.FileInfo that Readdir retruned, you'd see the updated modification time. And then if you touched the file again and called ModTime again, you wouldn't see the further-updated modification time. That could be confusing. But I expect that the vast majority of uses of Readdir use the results immediately or at least before making changes to files listed in the results. I suspect the vast majority of users would not notice this change.

I propose we make this change—make Readdir return lazy os.FileInfo—soon, intending it to land in Go 1.16, but ready to roll back the change if the remainder of the Go 1.16 dev cycle or beta/rc testing turns up important problems with it.

/cc @robpike @bradfitz @ianthehat @kr

Activity

added this to the Proposal milestone on Sep 2, 2020
bradfitz

bradfitz commented on Sep 2, 2020

@bradfitz
Contributor

Last time this was proposed there was debate about what the behavior for ModTime and Mode and Size should be if the lazy Lstat fails later, as they don't return errors. Panic is bad. Logging is weird. Zero values I guess?

bcmills

bcmills commented on Sep 2, 2020

@bcmills
Contributor

I think this is likely to introduce subtle changes in behavior. Perhaps more importantly, I don't think this is the sort of change that we can reliably verify during a development cycle.

In my experience, very few users who are not either Go contributors or Googlers test Beta or RC releases of the Go toolchain, and changes in the os package are less likely to turn up during Google testing because a significant fraction of Google programs do most of their I/O without using the os package directly.

tv42

tv42 commented on Sep 2, 2020

@tv42

os.FileInfo can't be lazy as-is, because it can't return an error. Returning a 0 size on transient errors is not acceptable.

ianlancetaylor

ianlancetaylor commented on Sep 2, 2020

@ianlancetaylor
Contributor

See also #40352, which is about different approaches to efficiently uncover similar information.

tv42

tv42 commented on Sep 2, 2020

@tv42

(Ignoring the POSIX API for a moment) NFS, and likely many other network filesystems, can do completely separate operations depending on whether the stat info is going to be needed (NFSv3 readdir vs readdirplus, NFSv4 "bulk LOOKUP", FUSE_READDIRPLUS).

There's also been a lot of talk about a Linux syscall that would fetch getdents+lstat info, for example https://lwn.net/Articles/606995/ -- they all seem to revolve around the idea of the client knowing beforehand whether it will be doing the lstat calls or not, and communicating that to the kernel.

These combined make me think the right path forward would be a Readdir method that takes arguments that inform it which os.FileInfo fields will be wanted; the rest could be zero values.

(That extended Readdir could also take a flag for whether to sort the results or not, removing one common cause of forks for performance reasons.)

networkimprov

networkimprov commented on Sep 3, 2020

@networkimprov

EDIT: This has a detailed proposal in #41265

I believe we need a new dirent abstraction.

After reviewing suggestions from the FS API discussion...

Let's consider a hybrid:

  1. ReadDir(path string, n int, opt uint64) ([]DirItem, error) - opt is fields to load and sorting (0 is OS defaults)
    (may return more fields than requested)
  2. (d *DirEntry) Load(fields uint64) error - (re-)loads the fields
    (returns error if inode doesn't match)
  3. (d *DirEntry) Has(fields uint64) bool - indicates whether the fields are loaded
  4. (d *DirEntry) Id() FileId - gives unix inode or winapi fileId; could take an argument re device info
  5. (d *DirEntry) Xyz() T - panics for any field not loaded (a programmer mistake)

That solves the .ModTime() etc issue with lazy-loading, and avoids an error check after every field access.

EDIT: The interface which DirEntry implements and ReadDir() returns:

type DirItem interface {
   Load(fields uint64) error
   Has(fields uint64) bool
   Name() string
   IsDir() bool
}

Rationale:
a) If you need certain fields for every item, request them in ReadDir().
b) If you need certain fields for some items, request them in DirEntry.Load().
c) If you need certain fields only when they're the OS default, check for them with DirEntry.Has().
d) If you need the latest data for an item, request it with DirEntry.Load().

tv42

tv42 commented on Sep 3, 2020

@tv42
  1. (d *DirEntry) Id() FileId - gives unix inode or winapi fileId

Wasn't the Windows FileId 128-bit? (Seen somewhere on go issues around the greater topic in the last few days.)

Either way, the unix inode number isn't very useful without the device major:minor. For example, you can't filepath.Walk and expect inode alone to identify hardlinked files, because you may have crossed a mountpoint.

Also, inode number and such belong in a .Sys() style, platform-specific, extension point.

networkimprov

networkimprov commented on Sep 3, 2020

@networkimprov

I gave the .Id() type as FileId, which can be whatever (e.g. opaque array), and store major:minor. The fact that a FileId can't be compared across platforms isn't a reason to hide it. It's needed to replicate a tree containing multiple hard links for a file -- which I do.

Today on Windows, FileInfo.Sys() can't even provide the fileId! Adding it was debated and discarded.

Is there any practical value in .Sys() besides providing .Ino on unix?

Winapi fileId is 64-bit on NTFS, 128-bit on ReFS (an alternative for Windows Server):
https://docs.microsoft.com/en-us/windows/win32/api/fileapi/ns-fileapi-by_handle_file_information
https://docs.microsoft.com/en-us/windows-server/storage/refs/refs-overview

benhoyt

benhoyt commented on Sep 3, 2020

@benhoyt
Contributor

I really like the intent, but I agree with other commenters that the API just doesn't quite fit as is, because of potential errors returned by the lazy methods. We actually debated a very similar issue when designing the os.scandir / os.DirEntry API in Python. At first we wanted to make the DirEntry methods properties, like entry.stat without the function call parentheses. That works in Python, but it looks like a plain attribute access, and people aren't expecting to have to catch exceptions (errors) when accessing an attribute, so we made it a function call. Per the docs:

Because the os.DirEntry methods can make operating system calls, they may also raise OSError. If you need very fine-grained control over errors, you can catch OSError when calling one of the os.DirEntry methods and handle as appropriate.

I believe this decision was based on theoretical concerns, not from actual testing, but still, the logic seems sound. Especially in Go, where all error handling is super-explicit (we always want "find grained control over errors"). Panic-ing is not going to work, and silently returning a zero value is arguably worse.

32 remaining items

mpx

mpx commented on Sep 17, 2020

@mpx
Contributor

I think this proposal trades correctness & simplicity for performance, as well as breaking the Go1 compatiblity promise.

In the past I've really appreciated that Go hasn't made this tradeoff and has found other ways of improving performance. This kind of behaviour is better suited to unsafe or other similarly out of the way places where people are expected to understand the risks.

All existing programs have (implicitly or explicitly) been written with the assumption that FileInfo methods cannot fail, and that FileInfo contains data that was accurate during the Readdir call. All failures are explicitly handled via Readdir.

random sampling hasn't found even a single program that would break

Is this assuming that Stat cannot fail after Readdir? If so, it's very likely programs will misbehave. Eg, race with unlink, race with file replacement, corruption, network/fuse failures,.. these failures are uncommon but they will definitely occur and they should be handled gracefully. As a trivial example, a "du" implementation might display a bogus size, or subtract 1 from the total size.

In future, programs would need to check the Readdir error and the sentinel values from some of the FileInfo methods to be correct.

In practice, many developers will not check the results from Size, Mode, ModTime since it mostly works, is easier, and they don't expect failures (mismatch with mental model). When the deferred stat fails the resulting misbehaviour may be hard to recognise or understand - especially since there is no concrete error. This would be a poor API prone to incorrect use and bugs.

Using it correctly would be extra hassle:

    sz := fi.Size()
    if sz < 0 {
        // Some unknown error occurred, retry the operation to obtain the error or obtain a valid FileInfo.
        fi, err = os.Lstat(filepath.Join(f.Name(), fi.Name()))
        if err != nil {
           // Error handling
        }
    }
    // Use size.

APIs that guarantee correctness without needing explicit error handling are extremely useful (eg, FileInfo). It would be disappointing to lose this property.

I want to use io/fs and embed as soon as is practical - but I wouldn't want to compromise correctness or ease of use. If os.File cannot be changed, then I'd strongly prefer we accept the current performance over deferred stat.

If adding a method to os.File was acceptible we could achieve the same performance objective by:

  • Creating a Dirent interface as a subset of FileInfo (Name, IsDir methods).
  • Create a os.File.ReadDirent(n int) ([]Dirent, error) method
  • Optionally de-emphasizing Readdir and Readdirnames in the io/fs proposal in favour of ReadDirent

This might be less controversial than deferring stat?

tv42

tv42 commented on Sep 17, 2020

@tv42

@mpx I like what you said, but: Dirent.IsDir is impossible. Linux dirent d_type generally communicates more than just IsDir (DT_LNK etc), but the direntry type can be unknown. There's no way to write an func (Dirent) IsDir() bool that isn't forced to lie; the API needs to have a slightly different shape.

https://www.man7.org/linux/man-pages/man3/readdir.3.html

diamondburned

diamondburned commented on Sep 17, 2020

@diamondburned

but the direntry type can be unknown

For the sake of completeness, couldn't there be a fallback to lstat when DT_UNKNOWN is seen? Given my API, one could implement like so:

func (d *dirEnt) IsDir() bool {
    if d.stat != nil {
        return d.stat.isDir
    }

    return d.isDir
}

Although this no longer completely satisfies the goals of this issue (i.e. having a ReadDir API that would not call lstat many times), I would argue that this is a simpler API than some of the other verbose ones while still covering most of the use-cases.

I can see another problem with this API though: Lstat() now may or may not return a newer FileInfo, which is the same issue as the lazy-loaded FileInfo API.

jimmyfrasche

jimmyfrasche commented on Sep 17, 2020

@jimmyfrasche
Member

While I am generally in favor, discussing the specifics of a new api is premature when it hasn't been decided if it's necessary yet.

rsc

rsc commented on Sep 18, 2020

@rsc
ContributorAuthor

I certainly hear you all about the change being strange. I agree it's a bit odd.
For what it's worth, I don't even think this is my idea. Brad says this has been proposed before.

The reason I'm trying hard to find a path forward here is that I'm trying to balance a few different concerns:

  • filepath.Walk can be made much faster by using the IsDir bits that are present in Unix directory reads without the full stat info. Tools like goimports substitute their own implementation doing just that. They get speed by giving up generality. That's fine since the case is actually specific anyway (it's always an OS file system underneath).
  • If the io/fs proposal (io/fs: add file system interfaces #41190) is adopted with no changes here, it will be impossible to achieve the same speed in a general Walk for FS. A few people raised that concern and it seems worth addressing.
  • Changes to the Walk API to help speed, such as something along the lines of https://pkg.go.dev/github.com/kr/fs#Walk, are worth considering but are impossible without a faster general underlying directory read.
  • If we're going to address that problem, now is the time.
  • We want to keep APIs simple (but as always not too simple).

Allowing lazy Readdir elegantly solves almost all of this, at the cost of the lazy behavior that seems from direct code inspection not to matter as much as you'd initially think. If we don't fix this problem now, we will be stuck with programs like goimports having their own custom filepath.Walk, and worse there will be no way to write a custom filepath.Walk for the general FS implementations.

If there's not consensus on the lazy Readdir - as there does not seem to be - then it still seems worth trying to fix the problem another way. Whatever we do, it needs to be a limited change: a simple, Go-like API. I expanded @mpx's suggestion above into a separate proposal, #41467. Please see the description and comment over there. Thanks.

rsc

rsc commented on Sep 23, 2020

@rsc
ContributorAuthor

Retracting per discussion above; see #41467.

locked and limited conversation to collaborators on Sep 23, 2021
moved this to Declined in Proposalson Aug 10, 2022
removed this from Proposalson Oct 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @bradfitz@rsc@jimmyfrasche@tv42@networkimprov

        Issue actions

          proposal: os: make Readdir return lazy FileInfo implementations · Issue #41188 · golang/go