proposal: os: new Readdirentries method to read directory entries and efficiently expose file metadata

### API

_(Update: edited to turn into a proposal, add suggested API)_

```go
func (f *File) Readdirentries(n int) ([]DirEntry, error)

// The file type could not be determined
const ModeUnknown FileMode = xxx

type DirEntry struct {
  // Name is the base name of the file
  Name string
  // ... unexported fields ...
}

// Id returns a number which uniquely identifies the file within the filesystem that contains it.
// This is known as the inode number in Unix-like systems, or file ID in Windows. Under Windows,
// this will require a system call on first call, but not on Unix.
//
// Note this is not guaranteed to be unique in the ReFS file system, introduced with Windows
// Server 2012, since that uses 128-bit identifiers.
func (d *DirEntry) Id() (uint64, error) { ... }

// Type returns the file's type. Depending on the underlying filesystem, this may require an Lstat,
// which will be done internally and cached after first use. If lightweight is true, the Lstat will not
// be done; in that case, if the file type is not immediately known, ModeUnknown will be
// returned. This may be useful e.g.if the caller will be opening the file anyway and would prefer
// to do a Stat of the open file to avoid filename races. 
func (d *DirEntry) Type(lightweight bool) (FileMode, error) { ... }

// Lstat behaves like the normal Lstat. Its result will be cached after the first use, which may have
// occurred from calling Type or even Inode under Windows.
func (d *DirEntry) Lstat() (FileInfo, error) { ... }
```

This is analogous e.g. to Python's [os.scandir](https://docs.python.org/3/library/os.html#os.scandir), as pointed out by @qingyunha below.

### Context

Could we please have a new File-level API to list a directory's entries, which exposes the `d_type` field (`syscall.Dirent.Type`) and, ideally, also `d_ino` (`syscall.Dirent.Ino`)?

Under Linux, certain filesystems (such as Btrfs, ext4) store the file type information in the direntry itself. This is available via the `d_type` field, which may be `DT_UNKNOWN` if the file type could not be determined for some reason (e.g. no filesystem support, or weird quirks such as "." or ".."). According to `man readdir(3)`, some BSDs also support this.

Currently, we have `os.(*File).Readdir`, which does an `lstat` on every file and does not make use of the type information, even if it's there. This makes sense given the method's signature, since it needs to find out the file's size, mode, etc.

We also have `os.(*File).Readdirnames`, which reads the dirent but only returns the name portion.

It would be very useful to have an intermediate method between these two, that returns not only the name, but also the file type (which may of course be `DT_UNKNOWN`), and ideally anything else it can know from the dirent, such as the file's inode number (`d_ino` or `syscall.Dirent.Ino`).

This would make it much easier to implement a fast/scalable file crawler (e.g. for backup software or something else). Given a directory with 100,000 entries, being able to cheaply separate subdirectories from other files while listing the directory itself lets the crawler e.g. batch up regular files for further processing, or choose crawling strategies depending on whether there are 2 subdirectories or 75,000. Especially for the backup case, having the inode number outright would also be useful, as it helps identify hardlinks (which may skip reading the data twice) without the cost of the `lstat`.

See e.g. [this topic](https://groups.google.com/d/topic/golang-nuts/PZH2jEAlAOE/discussion) in golang-nuts for some speed comparisons. This can make a very big difference.

	func (file *File) readdirnames(n int) (names []string, err error) {
	fis, err := file.Readdir(n)
	names = make([]string, len(fis))
	for i, fi := range fis {
	names[i] = fi.Name()
	}
	return names, err
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

proposal: os: new Readdirentries method to read directory entries and efficiently expose file metadata #40352

API

Context

34 remaining items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

proposal: os: new Readdirentries method to read directory entries and efficiently expose file metadata #40352

Description

API

Context

Activity

ianlancetaylor commented on Jul 22, 2020

israel-lugo commented on Jul 22, 2020

networkimprov commented on Jul 22, 2020

ianlancetaylor commented on Jul 22, 2020

ianlancetaylor commented on Jul 22, 2020

networkimprov commented on Jul 22, 2020

alexbrainman commented on Jul 23, 2020

networkimprov commented on Jul 23, 2020

alexbrainman commented on Jul 23, 2020

israel-lugo commented on Jul 23, 2020

networkimprov commented on Jul 23, 2020

alexbrainman commented on Jul 26, 2020

34 remaining items

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions