Skip to content

Add nested data generation and tests (zarr-python and zarrita) #33

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 21, 2021

Conversation

grlee77
Copy link
Contributor

@grlee77 grlee77 commented Apr 15, 2021

This PR adds nested data generation and reads with zarr-python (v2) and zarrita (v3).

z5py does not seem to support reading the nested data from zarr-python so I marked it as "skipped" in the test suite. It will currently show up as "FAILED: mismatch" in the test report tables.

I was originally using zarr.storage.FSStore to read the nested data with zarr-python in the tests, but this was failing due to zarr-developers/zarr-python#717. I switched it to NestedDirectoryStore, which passes.

For the report, row names now have ", flat" or ", nested" appended to indicate which format was read.

@joshmoore
Copy link
Member

I was originally using zarr.storage.FSStore to read the nested data with zarr-python in the tests, but this was failing due to zarr-developers/zarr-python#717. I switched it to NestedDirectoryStore, which passes.

Fix released as 2.7.1. Is trying with both implementations overkill?

@constantinpape
Copy link
Collaborator

z5py does not seem to support reading the nested data from zarr-python so I marked it as "skipped" in the test suite. It will currently show up as "FAILED: mismatch" in the test report tables.

Indeed, it's not supported by z5py yet. I will try to implement this at some point; in the meantime skipping the test and getting the mismatch message in the report is a good solution.

@grlee77
Copy link
Contributor Author

grlee77 commented Apr 16, 2021

Fix released as 2.7.1. Is trying with both implementations overkill?

Did you mean only reading the same file with both or did you want to also generate with both store types?

Would we then test with both FSStore and DirectoryStore for the flat case as well?

@joshmoore
Copy link
Member

Considering the bug you already found, if possible, I'd go for:

  • Directory: {flat} x {read, write}
  • NestedDirectory: {nested} x {read, write}
  • FSStore: {nested, flat} x {read, write}

Copy link
Member

@joshmoore joshmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test/test_read_all.py::test_correct_read[read zarr (DirectoryStore) zarr using z5py, blosc] PASSED [ 27%]
test/test_read_all.py::test_correct_read[read zarr (DirectoryStore) zarr using z5py, gzip] PASSED [ 28%]
test/test_read_all.py::test_correct_read[read zarr (DirectoryStore) zarr using z5py, raw] PASSED [ 30%]
test/test_read_all.py::test_correct_read[read zarr (DirectoryStore) zarr using z5py, zlib] PASSED [ 31%]
test/test_read_all.py::test_correct_read[read zarr (DirectoryStore) zarr using zarr, blosc] PASSED [ 32%]
test/test_read_all.py::test_correct_read[read zarr (DirectoryStore) zarr using zarr, gzip] PASSED [ 33%]
test/test_read_all.py::test_correct_read[read zarr (DirectoryStore) zarr using zarr, raw] PASSED [ 34%]
test/test_read_all.py::test_correct_read[read zarr (DirectoryStore) zarr using zarr, zlib] PASSED [ 35%]
test/test_read_all.py::test_correct_read[read zarr (FSStore) zarr using z5py, blosc] PASSED [ 36%]
test/test_read_all.py::test_correct_read[read zarr (FSStore) zarr using z5py, gzip] PASSED [ 37%]
test/test_read_all.py::test_correct_read[read zarr (FSStore) zarr using z5py, raw] PASSED [ 38%]
test/test_read_all.py::test_correct_read[read zarr (FSStore) zarr using z5py, zlib] PASSED [ 40%]
test/test_read_all.py::test_correct_read[read zarr (FSStore) zarr using zarr, blosc] PASSED [ 41%]
test/test_read_all.py::test_correct_read[read zarr (FSStore) zarr using zarr, gzip] PASSED [ 42%]
test/test_read_all.py::test_correct_read[read zarr (FSStore) zarr using zarr, raw] PASSED [ 43%]
test/test_read_all.py::test_correct_read[read zarr (FSStore) zarr using zarr, zlib] PASSED [ 44%]
test/test_read_all.py::test_correct_read[read zarr (FSStore, nested) zarr using z5py, blosc] SKIPPED [ 45%]
test/test_read_all.py::test_correct_read[read zarr (FSStore, nested) zarr using z5py, gzip] SKIPPED [ 46%]
test/test_read_all.py::test_correct_read[read zarr (FSStore, nested) zarr using z5py, raw] SKIPPED [ 47%]
test/test_read_all.py::test_correct_read[read zarr (FSStore, nested) zarr using z5py, zlib] SKIPPED [ 48%]
test/test_read_all.py::test_correct_read[read zarr (FSStore, nested) zarr using zarr, blosc] PASSED [ 50%]
test/test_read_all.py::test_correct_read[read zarr (FSStore, nested) zarr using zarr, gzip] PASSED [ 51%]
test/test_read_all.py::test_correct_read[read zarr (FSStore, nested) zarr using zarr, raw] PASSED [ 52%]
test/test_read_all.py::test_correct_read[read zarr (FSStore, nested) zarr using zarr, zlib] PASSED [ 53%]
test/test_read_all.py::test_correct_read[read zarr (NestedDirectoryStore, nested) zarr using z5py, blosc] SKIPPED [ 54%]
test/test_read_all.py::test_correct_read[read zarr (NestedDirectoryStore, nested) zarr using z5py, gzip] SKIPPED [ 55%]
test/test_read_all.py::test_correct_read[read zarr (NestedDirectoryStore, nested) zarr using z5py, raw] SKIPPED [ 56%]
test/test_read_all.py::test_correct_read[read zarr (NestedDirectoryStore, nested) zarr using z5py, zlib] SKIPPED [ 57%]
test/test_read_all.py::test_correct_read[read zarr (NestedDirectoryStore, nested) zarr using zarr, blosc] PASSED [ 58%]
test/test_read_all.py::test_correct_read[read zarr (NestedDirectoryStore, nested) zarr using zarr, gzip] PASSED [ 60%]
test/test_read_all.py::test_correct_read[read zarr (NestedDirectoryStore, nested) zarr using zarr, raw] PASSED [ 61%]
test/test_read_all.py::test_correct_read[read zarr (NestedDirectoryStore, nested) zarr using zarr, zlib] PASSED [ 62%]

Nice additions.

f"file not found: {fpath}. Make sure you have generated the data "
"using 'make data'"
)
test = read_fn(fpath, codec, nested)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While struggling with zarr-developers/zarr-python#717, I introduced a pseudo-checksum in ome_zarr_test_suite that just asserts that there's at least one non-zero value in the opened dataset. My guess is that without zarr-developers/zarr-python#716 that will be the case here as well.

Longer-term, I assume we could make use of some form of true checksum'ing to show that when nesting is not properly detected, that the checksums differ (since all values revert to fill_value)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention here was mainly to guard against mistakes in the test parameter generation. We should never be trying to read from a data/XXX folder that doesn't exist.

The issue I saw in the past with fill_value was when the files did exist, but due to the / separator not being found as a key, the fill value was used instead. I think they are two separate issues.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that they are separate, but I think adding a checksum will cause test failures that we aren't seeing yet. But merging this for now. Thanks.

@joshmoore
Copy link
Member

@grlee77 : do you have anything else for this PR?

@grlee77
Copy link
Contributor Author

grlee77 commented Apr 21, 2021

No, I think it should be good to go. Merging will likely cause conflicts in #34, but hopefully those would be fairly easy to sort out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants