-
Notifications
You must be signed in to change notification settings - Fork 382
instance- and dir-caches #243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this replace #216?
fsspec/dircache.py
Outdated
import time | ||
|
||
|
||
class DirCache(dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're changing the __init__
signature, I think it'd be better to compose this class with a ._cache
method rather than inheriting from dict. You might inherit from MutableMapping instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll probably inherit from nothing in the end, since I don't expect calls to keys()
, etc.
self.q(item) | ||
return self._cache[item] # maybe raises KeyError | ||
|
||
def __setitem__(self, key, value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this validate the structure of value
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not certain, I don't know if we want to force implementations here. Might make things simpler if we do.
fsspec/dircache.py
Outdated
Parameters to this class control listing expiry or indeed turn | ||
caching off | ||
""" | ||
def __init__(self, usecache=True, expiry_time=False, max_size=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this max_size in terms of elements or bytes?
Evicting based on the number of elements doesn't seem especially useful to me, at least no compared to the size of the values. But perhaps s3fs or gcsfs already does that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is number of elements as it stands, because it's far easier to implement using LRU.
There is no such expiry anywhere yet; gcsfs has time-based expiry.
fsspec/dircache.py
Outdated
self._times = {} | ||
if max_size: | ||
self.q = lru_cache(max_size + 1)(lambda key: self._cache.pop(key, None)) | ||
self.use = usecache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These names should match the names of the arguments to init.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
fsspec/dircache.py
Outdated
self._cache = {} | ||
self._times = {} | ||
if max_size: | ||
self.q = lru_cache(max_size + 1)(lambda key: self._cache.pop(key, None)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels a bit strange to have the max_size parameter affect the attribute where we actually store things. That leads to stuff like if self.max
later on. Can we just always store it in the same place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
._cache
stored the listings. The q and times dicts hold only information on what to automatically evict, and are not accessed if eviction is not desired.
@TomAugspurger , this can be considered working, although the docs should be updated to explain this (and changes propagated to s3fs, gcsfs). Did you have an opinion if |
Do you mean a |
I'm going against this in the end, because there can be different requirements in the implementations (e.g., on s3, the creation of a key may affect all parent dirs, but on ssh you must explicitly make those dirs first) |
@TomAugspurger what do you think, is this a simple way to solve things? I'm thinking every fsspec instance will have a
dircache
instance attached, but they are not required to use it (e.g., local never would). What exactly the cache contains is up to the implementation, but the structure in the docstring covers s3fs and gcsfs . When I plug it in,invalidate_cache()
should maybe be moved to the new class, with an optional flag on whether directory parents should also be invalidated.