Bump Numcodecs requirement to 0.6.2 #352

jakirkham · 2018-12-04T00:02:25Z

Follow-up to PR ( #347 )

Fixes #324

As there are some critical fixes and needed features in the latest Numcodecs, this bumps our lower bound to the latest version. Pulls most of the content from the aforementioned PR. Drops some commits that have been broken out into subsequent PRs to keep this focused on the Numcodecs upgrade. Does some refactoring and cleanup of internal functions thanks to utility functions now included in Numcodecs.

TODO:

Add unit tests and/or doctests in docstrings
Add docstrings and API docs for any new/modified user-facing classes and functions
New/modified features documented in docs/tutorial.rst
Changes documented in docs/release.rst
Docs build locally (e.g., run tox -e docs)
AppVeyor and Travis CI passes
Test coverage is 100% (Coveralls passes)

Previously MsgPack was turning bytes objects to unicode objects when round-tripping them. However this has been fixed in the latest version of Numcodecs. So correct this test now that MsgPack is working correctly.

As we already ensured the `chunk` is an `ndarray` viewing the original data, there is no need for us to do that here as well. Plus the checks performed by `ensure_contiguous_ndarray` are not needed for our use case here. Particularly as we have already handled the unusual type cases above. We also don't need to constrain the buffer size. As such the only thing we really need is to flatten the array and make it contiguous, which is what we handle here directly.

As both the expected `object` case and the non-`object` case perform a `reshape` to flatten the data, go ahead and refactor that out of both cases and handle it generally. Simplifies the code a bit.

As refactoring of the `reshape` step has effectively dropped the expected `object` type case, the checks for different types is a little more complicated than needed. To fix this, basically invert and swap the case ordering. This way we can handle all generally expected types first and simply cast them. Then we can raise if an `object` type shows up and is unexpected.

As Numcodecs now includes a very versatile and effective `ensure_bytes` function, there is no need to define our own in `zarr.storage` as well. So go ahead and drop it.

Make use of Numcodecs' `ensure_contiguous_ndarray` to take `ndarray` views onto buffers to be stored in a few cases so as to reshape them and avoid a copy (thanks to the buffer protocol). This ensures that datetime/timedeltas are handled by default. Also catches things like object arrays. Finally this handles flattening the array if needed. All-in-all this gets as close to a `bytes` object as possible while not copying and doing its best to preserve type information while constructing something that fits the buffer protocol.

Rewrite `buffer_size` to just use Numcodecs' `ensure_ndarray` to get an `ndarray` that views the data. Once the `ndarray` is gotten, all that is needed is to access its `nbytes` member, which returns the number of bytes that it takes up.

If the data is already a `str` instance, turn `ensure_str` into a no-op. For all other cases, make use of Numcodecs' `ensure_bytes` to aid `ensure_str` in coercing data through the buffer protocol. If we are on Python 3, then decode the `bytes` object to a `str`.

As Blosc got upgraded and it contained an upgrade of Zstd, the results changed a little bit for this example. So update them accordingly. Should fix the doctest failure.

alimanfoo

Thanks @jakirkham, all looks good.

jakirkham · 2018-12-04T03:06:53Z

Looks like the line not covered is a fallback for when file removal fails. Given PR ( #327 ) obviates that, maybe we should merge that PR and drop that fallback. Thoughts?

jakirkham · 2018-12-04T14:58:33Z

Made PR ( #355 ) to simply ignore coverage on that group of lines.

jakirkham · 2018-12-04T19:17:19Z

Alright, think this is ready now. 😄

alimanfoo · 2018-12-04T22:19:10Z

Awesome, merging...

jakirkham · 2018-12-10T08:10:04Z

Missed a bit of code that could be simplified with ensure_ndarray. Addressing that in PR ( #360 ).

jakirkham · 2018-12-10T18:24:02Z

Also dropping a workaround for older Numcodecs. ( #361 ) Please let me know if there are more of these.

jakirkham and others added 12 commits November 30, 2018 12:18

Bump Numcodecs requirement to 0.6.1

7eed366

Assert MsgPack round-trips bytes objects correctly

2552f62

Previously MsgPack was turning bytes objects to unicode objects when round-tripping them. However this has been fixed in the latest version of Numcodecs. So correct this test now that MsgPack is working correctly.

properly guard against removal of object codec

aee5ace

Ensure chunk in _decode_chunk is an ndarray

bf4eee8

Refactor reshape from _decode_chunk

f3144ae

As both the expected `object` case and the non-`object` case perform a `reshape` to flatten the data, go ahead and refactor that out of both cases and handle it generally. Simplifies the code a bit.

Drop ensure_bytes definition from zarr.storage

9badf39

As Numcodecs now includes a very versatile and effective `ensure_bytes` function, there is no need to define our own in `zarr.storage` as well. So go ahead and drop it.

Simplify buffer_size by using ensure_ndarray

2c6ac77

Rewrite `buffer_size` to just use Numcodecs' `ensure_ndarray` to get an `ndarray` that views the data. Once the `ndarray` is gotten, all that is needed is to access its `nbytes` member, which returns the number of bytes that it takes up.

Bump to Numcodecs 0.6.2

bc4d579

jakirkham mentioned this pull request Dec 4, 2018

Bump Numcodecs requirement to 0.6.1 #347

Closed

7 tasks

Update tutorial's info content

efacb52

As Blosc got upgraded and it contained an upgrade of Zstd, the results changed a little bit for this example. So update them accordingly. Should fix the doctest failure.

jakirkham added this to the v2.3 milestone Dec 4, 2018

jakirkham requested a review from alimanfoo December 4, 2018 00:29

alimanfoo approved these changes Dec 4, 2018

View reviewed changes

Merge 'zarr-developers/master' into 'jakirkham/use_numcodecs_0.6.2'

cad0007

release notes [ci skip]

cc1d776

alimanfoo merged commit c4427a4 into zarr-developers:master Dec 4, 2018

alimanfoo mentioned this pull request Dec 4, 2018

avoid race condition during chunk write #327

Merged

7 tasks

jakirkham deleted the use_numcodecs_0.6.2 branch December 5, 2018 02:48

This was referenced Dec 10, 2018

Use ensure_ndarray to view chunk as an array #360

Merged

Drop temporary workaround for get_codec #361

Merged

jakirkham mentioned this pull request Jan 4, 2019

RFC: Optionally support memory-mapping DirectoryStore values #377

Closed

7 tasks

jakirkham mentioned this pull request Jun 16, 2020

Add blosc getitem zarr-developers/numcodecs#235

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Bump Numcodecs requirement to 0.6.2 #352

Bump Numcodecs requirement to 0.6.2 #352

Uh oh!

jakirkham commented Dec 4, 2018 •

edited

Loading

Uh oh!

alimanfoo left a comment

Uh oh!

jakirkham commented Dec 4, 2018

Uh oh!

jakirkham commented Dec 4, 2018

Uh oh!

jakirkham commented Dec 4, 2018

Uh oh!

alimanfoo commented Dec 4, 2018

Uh oh!

jakirkham commented Dec 10, 2018

Uh oh!

jakirkham commented Dec 10, 2018

Uh oh!

Uh oh!

Uh oh!

Bump Numcodecs requirement to 0.6.2 #352

Bump Numcodecs requirement to 0.6.2 #352

Uh oh!

Conversation

jakirkham commented Dec 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alimanfoo left a comment

Choose a reason for hiding this comment

Uh oh!

jakirkham commented Dec 4, 2018

Uh oh!

jakirkham commented Dec 4, 2018

Uh oh!

jakirkham commented Dec 4, 2018

Uh oh!

alimanfoo commented Dec 4, 2018

Uh oh!

jakirkham commented Dec 10, 2018

Uh oh!

jakirkham commented Dec 10, 2018

Uh oh!

Uh oh!

jakirkham commented Dec 4, 2018 •

edited

Loading