-
Notifications
You must be signed in to change notification settings - Fork 382
comments from arrow #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@wesm commented on dev@arrow https://cwiki.apache.org/confluence/display/ARROW/Python+Filesystems+and+Filesystem+API I added a JIRA filter and tagged a couple of filesystem-related As far as refining the details of the API, we should see what's the NB: TensorFlow defines a filesystem abstraction, albeit in C++ with |
@wesm , sorry, I didn't understand this - a PR of which patch to where? I think I said this previously, I'd be very happy to see everything in this repo be taken over by apache/arrow, if it's at all useful. That might in itself improve conversation. I'm also thoroughly behind amending the file-system APIs I'm involved in to adhere to a consensus design. The link to TensorFlow is interesting, a fairly different and minimalist take on filesystem functions. I assume that there is no file-like layer (yet), i.e., you can neither grab a file descriptor handle nor pass anything directly to python functions expecting to read from files. |
It would be just for the purposes of code review / line-by-line comments. So you could create an empty branch (say based on the repo's initial commit) and then make a PR from HEAD into the empty branch |
@pitrou wrote the following on [email protected]
Here are some comments:
API naming: you seem to favour re-using Unix command-line monickers in
some places, while using more regular verbs or names in other
places. I think it should be consistent. Since the Unix
command-line doesn't exactly cover the exposed functionality, and
since Unix tends to favour short cryptic names, I think it's better
to use Python-like naming (which is also more familiar to non-Unix
users). For example "move" or "rename" or "replace" instead of "mv",
etc.
**kwargs parameters: a couple APIs (
mkdir
,put
...) allow passingarbitrary parameters, which I assume are intended to be
backend-specific. It makes it difficult to add other optional
parameters to those APIs in the future. So I'd make the
backend-specific directives a single (optional) dict parameter rather
than a **kwargs.
invalidate_cache
doesn't state whether it invalidates recursivelyor not (recursively sounds better intuitively?). Also, I think it
would be more flexible to take a list of paths rather than a single
path.
du
: the effect of thedeep
parameter isn't obvious to me. I don'tknow what it would mean not to recurse here: what is the size of a
directory if you don't recurse into it?
glob
may need a formal definition (are trailing slashessignificant for directory or symlink resolution? this kind of thing),
though you may want to keep edge cases backend-specific.
are
head
andtail
at all useful? They can be easily recreatedusing a generic
open
facility.read_block
tries to do too much in a single API IMHO, andusing
open
directly is more flexible anyway.if
touch
is intended to emulate the Unix API of the same name, thedocstring should state "Create empty file or update last modification
timestamp".
the information dicts returned by several APIs (
ls
,info
....)need standardizing, at least for non backend-specific fields.
if the backend is a networked filesystem with non-trivial latency,
perhaps the operations would deserve being batched (operate on
several paths at once), though I will happily defer to your expertise
on the topic.
The text was updated successfully, but these errors were encountered: