Skip to content

Bump Numcodecs requirement to 0.6.2 #352

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Dec 4, 2018
Merged

Bump Numcodecs requirement to 0.6.2 #352

merged 15 commits into from
Dec 4, 2018

Conversation

jakirkham
Copy link
Member

@jakirkham jakirkham commented Dec 4, 2018

Follow-up to PR ( #347 )

Fixes #324

As there are some critical fixes and needed features in the latest Numcodecs, this bumps our lower bound to the latest version. Pulls most of the content from the aforementioned PR. Drops some commits that have been broken out into subsequent PRs to keep this focused on the Numcodecs upgrade. Does some refactoring and cleanup of internal functions thanks to utility functions now included in Numcodecs.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • Docs build locally (e.g., run tox -e docs)
  • AppVeyor and Travis CI passes
  • Test coverage is 100% (Coveralls passes)

jakirkham and others added 12 commits November 30, 2018 12:18
Previously MsgPack was turning bytes objects to unicode objects when
round-tripping them. However this has been fixed in the latest version
of Numcodecs. So correct this test now that MsgPack is working
correctly.
As we already ensured the `chunk` is an `ndarray` viewing the original
data, there is no need for us to do that here as well. Plus the checks
performed by `ensure_contiguous_ndarray` are not needed for our use case
here. Particularly as we have already handled the unusual type cases
above. We also don't need to constrain the buffer size. As such the only
thing we really need is to flatten the array and make it contiguous,
which is what we handle here directly.
As both the expected `object` case and the non-`object` case perform a
`reshape` to flatten the data, go ahead and refactor that out of both
cases and handle it generally. Simplifies the code a bit.
As refactoring of the `reshape` step has effectively dropped the
expected `object` type case, the checks for different types is a little
more complicated than needed. To fix this, basically invert and swap the
case ordering. This way we can handle all generally expected types first
and simply cast them. Then we can raise if an `object` type shows up and
is unexpected.
As Numcodecs now includes a very versatile and effective `ensure_bytes`
function, there is no need to define our own in `zarr.storage` as well.
So go ahead and drop it.
Make use of Numcodecs' `ensure_contiguous_ndarray` to take `ndarray`
views onto buffers to be stored in a few cases so as to reshape them and
avoid a copy (thanks to the buffer protocol). This ensures that
datetime/timedeltas are handled by default. Also catches things like
object arrays. Finally this handles flattening the array if needed.
All-in-all this gets as close to a `bytes` object as possible while not
copying and doing its best to preserve type information while
constructing something that fits the buffer protocol.
Rewrite `buffer_size` to just use Numcodecs' `ensure_ndarray` to get an
`ndarray` that views the data. Once the `ndarray` is gotten, all that is
needed is to access its `nbytes` member, which returns the number of
bytes that it takes up.
If the data is already a `str` instance, turn `ensure_str` into a no-op.
For all other cases, make use of Numcodecs' `ensure_bytes` to aid
`ensure_str` in coercing data through the buffer protocol. If we are on
Python 3, then decode the `bytes` object to a `str`.
@jakirkham jakirkham mentioned this pull request Dec 4, 2018
7 tasks
As Blosc got upgraded and it contained an upgrade of Zstd, the results
changed a little bit for this example. So update them accordingly.
Should fix the doctest failure.
@jakirkham jakirkham added this to the v2.3 milestone Dec 4, 2018
@jakirkham jakirkham requested a review from alimanfoo December 4, 2018 00:29
Copy link
Member

@alimanfoo alimanfoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jakirkham, all looks good.

@jakirkham
Copy link
Member Author

Looks like the line not covered is a fallback for when file removal fails. Given PR ( #327 ) obviates that, maybe we should merge that PR and drop that fallback. Thoughts?

@jakirkham
Copy link
Member Author

Made PR ( #355 ) to simply ignore coverage on that group of lines.

@jakirkham
Copy link
Member Author

Alright, think this is ready now. 😄

@alimanfoo
Copy link
Member

Awesome, merging...

@alimanfoo alimanfoo merged commit c4427a4 into zarr-developers:master Dec 4, 2018
@jakirkham jakirkham deleted the use_numcodecs_0.6.2 branch December 5, 2018 02:48
@jakirkham
Copy link
Member Author

Missed a bit of code that could be simplified with ensure_ndarray. Addressing that in PR ( #360 ).

@jakirkham
Copy link
Member Author

Also dropping a workaround for older Numcodecs. ( #361 ) Please let me know if there are more of these.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MsgPack codec is broken for array of bytes objects
2 participants