-
-
Notifications
You must be signed in to change notification settings - Fork 335
Fixed MemoryStore.list_dir #2117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Ensures that nested children are listed properly.
src/zarr/testing/store.py
Outdated
await store.set("foo/c/d/2", Buffer.from_bytes(b"\x01")) | ||
await store.set("foo/c/d/3", Buffer.from_bytes(b"\x01")) | ||
|
||
keys_expected = ["foo"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this is failing on the S3 tests.
keys_expected = ["foo"]
keys_observed = [k async for k in store.list_dir("")]
> assert set(keys_observed) == set(keys_expected), keys_observed
E AssertionError: ['test/foo']
It's including self.path
(the bucket name?) in listing. Based on my reading of the tests, that should not be included, but I'm not 100% sure. 90940a0 has that fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm struggling to get my head around the new logic. Reading #2116, is the goal here to just not return any duplicate values? If so, I wonder if it would be easier to keep track of the values that have been returned, and only yield a value if it hasn't been returned yet (otherwise, check the next value). This way it would also truly generate the values one at a time, instead of the new approach that computes all the keys before yielding them.
Thanks for the review!
Effectively yes. Duplicate keys were a consequence of the previous implementation not handling arrays nested in child groups properly.
3c845d9 has an approach that's a middle ground between my previous fix and your suggestion. I think it's a lot simpler. It does still generate all the keys before yield them, but I'm not too worried about that. This is an in-memory store, so things should be pretty small and fast, and this matches the implementation of the non-prefix case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 looks good to me. I think if you drop the last bit of the assert statement, pytest
is clever and will report the difference between the two sets in it's output when the test fails.
Co-authored-by: David Stansby <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch @TomAugspurger!
src/zarr/testing/store.py
Outdated
await store.set("foo/c/1", Buffer.from_bytes(b"\x01")) | ||
await store.set("foo/c/d/1", Buffer.from_bytes(b"\x01")) | ||
await store.set("foo/c/d/2", Buffer.from_bytes(b"\x01")) | ||
await store.set("foo/c/d/3", Buffer.from_bytes(b"\x01")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think its super material to the test but I believe the intent of this test was to have foo
be an array c
be the subdirectory containing chunks. Perhaps it would be better to change this to something more like:
await store.set("foo/c/1", Buffer.from_bytes(b"\x01")) | |
await store.set("foo/c/d/1", Buffer.from_bytes(b"\x01")) | |
await store.set("foo/c/d/2", Buffer.from_bytes(b"\x01")) | |
await store.set("foo/c/d/3", Buffer.from_bytes(b"\x01")) | |
await store.set("foo/bar/c/1", Buffer.from_bytes(b"\x01")) | |
await store.set("foo/bar/c/2", Buffer.from_bytes(b"\x01")) | |
await store.set("foo/spam/c/1", Buffer.from_bytes(b"\x01")) | |
await store.set("foo/spam/c/2", Buffer.from_bytes(b"\x01")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right... IIUC, if you have a key at foo/zarr.json
with a node_type="array"
, that implies you won't have any keys at foo/bar/zarr.json
(a group or a node), since arrays can't have children?
I tried to update this in 7414b10, to make sure that the test store we construct is at least compatible with how a Zarr hierarchy would be laid out.
/foo # group
/c # array
/group-0
/group-1
/array-0
/array-1
/array-2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
Ensures that nested children are listed properly.
Closes #2116