-
-
Notifications
You must be signed in to change notification settings - Fork 329
Bump Numcodecs requirement to 0.6.2 #352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bump Numcodecs requirement to 0.6.2 #352
Conversation
Previously MsgPack was turning bytes objects to unicode objects when round-tripping them. However this has been fixed in the latest version of Numcodecs. So correct this test now that MsgPack is working correctly.
As we already ensured the `chunk` is an `ndarray` viewing the original data, there is no need for us to do that here as well. Plus the checks performed by `ensure_contiguous_ndarray` are not needed for our use case here. Particularly as we have already handled the unusual type cases above. We also don't need to constrain the buffer size. As such the only thing we really need is to flatten the array and make it contiguous, which is what we handle here directly.
As both the expected `object` case and the non-`object` case perform a `reshape` to flatten the data, go ahead and refactor that out of both cases and handle it generally. Simplifies the code a bit.
As refactoring of the `reshape` step has effectively dropped the expected `object` type case, the checks for different types is a little more complicated than needed. To fix this, basically invert and swap the case ordering. This way we can handle all generally expected types first and simply cast them. Then we can raise if an `object` type shows up and is unexpected.
As Numcodecs now includes a very versatile and effective `ensure_bytes` function, there is no need to define our own in `zarr.storage` as well. So go ahead and drop it.
Make use of Numcodecs' `ensure_contiguous_ndarray` to take `ndarray` views onto buffers to be stored in a few cases so as to reshape them and avoid a copy (thanks to the buffer protocol). This ensures that datetime/timedeltas are handled by default. Also catches things like object arrays. Finally this handles flattening the array if needed. All-in-all this gets as close to a `bytes` object as possible while not copying and doing its best to preserve type information while constructing something that fits the buffer protocol.
Rewrite `buffer_size` to just use Numcodecs' `ensure_ndarray` to get an `ndarray` that views the data. Once the `ndarray` is gotten, all that is needed is to access its `nbytes` member, which returns the number of bytes that it takes up.
If the data is already a `str` instance, turn `ensure_str` into a no-op. For all other cases, make use of Numcodecs' `ensure_bytes` to aid `ensure_str` in coercing data through the buffer protocol. If we are on Python 3, then decode the `bytes` object to a `str`.
As Blosc got upgraded and it contained an upgrade of Zstd, the results changed a little bit for this example. So update them accordingly. Should fix the doctest failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jakirkham, all looks good.
Looks like the line not covered is a fallback for when file removal fails. Given PR ( #327 ) obviates that, maybe we should merge that PR and drop that fallback. Thoughts? |
Made PR ( #355 ) to simply ignore coverage on that group of lines. |
Alright, think this is ready now. 😄 |
Awesome, merging... |
Missed a bit of code that could be simplified with |
Also dropping a workaround for older Numcodecs. ( #361 ) Please let me know if there are more of these. |
Follow-up to PR ( #347 )
Fixes #324
As there are some critical fixes and needed features in the latest Numcodecs, this bumps our lower bound to the latest version. Pulls most of the content from the aforementioned PR. Drops some commits that have been broken out into subsequent PRs to keep this focused on the Numcodecs upgrade. Does some refactoring and cleanup of internal functions thanks to utility functions now included in Numcodecs.
TODO:
tox -e docs
)