comments from arrow

@pitrou wrote the following on dev@arrow.apache.org

Here are some comments:

- API naming: you seem to favour re-using Unix command-line monickers in
 some places, while using more regular verbs or names in other
 places.  I think it should be consistent.  Since the Unix
 command-line doesn't exactly cover the exposed functionality, and
 since Unix tends to favour short cryptic names, I think it's better
 to use Python-like naming (which is also more familiar to non-Unix
 users). For example "move" or "rename" or "replace" instead of "mv",
 etc.

- **kwargs parameters: a couple APIs (`mkdir`, `put`...) allow passing
 arbitrary parameters, which I assume are intended to be
 backend-specific.  It makes it difficult to add other optional
 parameters to those APIs in the future.  So I'd make the
 backend-specific directives a single (optional) dict parameter rather
 than a **kwargs.

- `invalidate_cache` doesn't state whether it invalidates recursively
 or not (recursively sounds better intuitively?).  Also, I think it
 would be more flexible to take a list of paths rather than a single
 path.

- `du`: the effect of the `deep` parameter isn't obvious to me. I don't
 know what it would mean *not* to recurse here: what is the size of a
 directory if you don't recurse into it?

- `glob` may need a formal definition (are trailing slashes
 significant for directory or symlink resolution? this kind of thing),
 though you may want to keep edge cases backend-specific.

- are `head` and `tail` at all useful? They can be easily recreated
 using a generic `open` facility.

- `read_block` tries to do too much in a single API IMHO, and
 using `open` directly is more flexible anyway.

- if `touch` is intended to emulate the Unix API of the same name, the
 docstring should state "Create empty file or update last modification
 timestamp".

- the information dicts returned by several APIs (`ls`, `info`....)
 need standardizing, at least for non backend-specific fields.

- if the backend is a networked filesystem with non-trivial latency,
 perhaps the operations would deserve being batched (operate on
 several paths at once), though I will happily defer to your expertise
 on the topic.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

comments from arrow #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

comments from arrow #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions