-
-
Notifications
You must be signed in to change notification settings - Fork 330
BUG: Parsing ellipsis in indexing #93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks, yes this looks like a bug in Array.__getitem__() not handling
ellipsis properly.
Btw when you create an array via zarr.empty, the behaviour of any chunk
that has not yet been initialized is undefined. In practice, it will be
random memory, and may not be stable from one access to the next. Use of
zarr.zeros or zarr.full supplying an explicit fill_value is recommended
over zarr.empty.
…On Wednesday, November 30, 2016, jakirkham ***@***.***> wrote:
Running into some cases where parsing an Ellipsis does not work. Note
that Input 5 where an Ellipsis is used with an index doesn't work.
Side note: Seems there is something funky with floating point round off
error too (e.g. Input 3 and Input 4 seem to differ).
In [1]: import zarr
In [2]: z = zarr.empty(shape=(100, 110), chunks=(10, 11), dtype=float)
In [3]: z[0]
Out[3]:
array([ 6.91088620e-310, 6.91088620e-310, 2.10918838e-316,
2.10918838e-316, 1.94607893e-316, 5.72938864e-313,
0.00000000e+000, 3.95252517e-323, 1.57027689e-312,
1.93101617e-312, 1.25197752e-312, 1.18831764e-312,
1.12465777e-312, 1.06099790e-312, 2.31297541e-312,
2.33419537e-312, 1.93101617e-312, 1.93101617e-312,
1.25197752e-312, 1.18831764e-312, 1.12465777e-312,
1.03977794e-312, 1.25197752e-312, 2.31297541e-312,
5.72938864e-313, 1.01855798e-312, 1.08221785e-312,
1.25197752e-312, 1.25197752e-312, 1.18831764e-312,
1.97345609e-312, 6.79038653e-313, 1.93101617e-312,
2.31297541e-312, 5.72938864e-313, 1.01855798e-312,
1.93101617e-312, 1.93101617e-312, 1.25197752e-312,
1.18831764e-312, 1.12465777e-312, 1.06099790e-312,
2.31297541e-312, 5.72938864e-313, 1.01855798e-312,
1.97345609e-312, 1.93101617e-312, 1.06099790e-312,
5.72938864e-313, 1.01855798e-312, 2.75859453e-313,
5.72938864e-313, 1.57027689e-312, 1.93101617e-312,
1.16709769e-312, 5.72938864e-313, 1.01855798e-312,
5.72938864e-313, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 4.94065646e-324, 6.91087535e-310,
0.00000000e+000, 4.94065646e-324, 2.45550626e-321,
6.91088620e-310, 1.98184750e-316, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
6.91087588e-310, 1.19069821e-321, 2.05561901e-316,
6.91088620e-310, 6.91070369e-310, 1.03259720e-321,
6.91088620e-310, 6.91088620e-310, 7.93037613e-120,
1.44506353e+214, 3.63859382e+185, 2.43896203e-154,
7.75110916e+228, 4.44743484e+252])
In [4]: z[0, :]
Out[4]:
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0.])
In [5]: z[0, ...]---------------------------------------------------------------------------TypeError Traceback (most recent call last)<ipython-input-5-27fbc5543985> in <module>()----> 1 z[0, ...]
/opt/conda2/lib/python2.7/site-packages/zarr/core.pyc in __getitem__(self, item)
446
447 # normalize selection--> 448 selection = normalize_array_selection(item, self._shape)
449
450 # determine output array shape
/opt/conda2/lib/python2.7/site-packages/zarr/util.pyc in normalize_array_selection(item, shape)
184 # determine start and stop indices for all axes
185 selection = tuple(normalize_axis_selection(i, l)--> 186 for i, l in zip(item, shape))
187
188 # fill out selection if not completely specified
/opt/conda2/lib/python2.7/site-packages/zarr/util.pyc in <genexpr>((i, l))
184 # determine start and stop indices for all axes
185 selection = tuple(normalize_axis_selection(i, l)--> 186 for i, l in zip(item, shape))
187
188 # fill out selection if not completely specified
/opt/conda2/lib/python2.7/site-packages/zarr/util.pyc in normalize_axis_selection(item, l)
163
164 else:--> 165 raise TypeError('expected integer or slice, found: %r' % item)
166
167
TypeError: expected integer or slice, found: Ellipsis
In [6]: z[...]
Out[6]:
array([[ 6.91088620e-310, 6.91088620e-310, 2.12535499e-316, ...,
0.00000000e+000, 2.28439709e-032, 6.91088696e-310],
[ 0.00000000e+000, -5.25530781e-026, 6.91088696e-310, ...,
6.91087565e-310, 3.95252517e-323, 1.41861741e-316],
[ 6.91087582e-310, 4.44659081e-323, 1.41861622e-316, ...,
1.41867314e-316, 6.91087582e-310, 2.22329541e-322],
...,
[ 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, ...,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000],
[ 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, ...,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000],
[ 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, ...,
1.41861267e-316, 6.91087582e-310, 6.42285340e-323]])
In [7]: z[:, 0]
Out[7]:
array([ 6.91088620e-310, 6.91088620e-310, 6.91087579e-310,
6.91087579e-310, 6.91087579e-310, 6.91087579e-310,
6.91087579e-310, 6.91087579e-310, 6.91087535e-310,
6.91087535e-310, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
1.82894873e-060, 6.91087535e-310, 3.40411230e-321,
2.03587931e-316, 6.91088620e-310, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
1.82894873e-060, 6.91087534e-310, 6.91087578e-310,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
1.82894873e-060, 6.91087282e-310, 6.91087282e-310,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
1.82894873e-060, 6.91087578e-310, 6.91087535e-310,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
1.82894873e-060, 6.91087580e-310, 6.91087580e-310,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
1.82894873e-060, 6.91087580e-310, 6.91087566e-310,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
1.82894873e-060, 6.91087282e-310, 6.91087582e-310,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000])
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<https://github.com/alimanfoo/zarr/issues/93>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAq8QmIj-DAM-wHdwnm35RsrDoPIbDK8ks5rDc_QgaJpZM4LAlyc>
.
--
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Email: [email protected]
Web: http://purl.org/net/aliman
Twitter: https://twitter.com/alimanfoo
Tel: +44 (0)1865 287721
|
Good point. In real life, there was data in the Zarr array copied from an HDF5 file that motivated coming up with this example. Can confirm I have the same issue if I use |
Hm, interesting. Just to be clear, are you saying that you can create a
Zarr array, store some data into the array such that all chunks are
initialized, and then find that reading the same element of the array at
different times gives slightly different floating point values?
…On Wednesday, November 30, 2016, jakirkham ***@***.***> wrote:
Good point. In real life, there was data in the Zarr array copied from an
HDF5 file that motivated coming up with this example. Can confirm I have
the same issue if I use zarr.zeros instead.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<https://github.com/alimanfoo/zarr/issues/93#issuecomment-263987352>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAq8QqPao0eSTJhgm9icUoteol5UD3idks5rDd3rgaJpZM4LAlyc>
.
--
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Email: [email protected]
Web: http://purl.org/net/aliman
Twitter: https://twitter.com/alimanfoo
Tel: +44 (0)1865 287721
|
Oh no sorry. 😳 That was vague. Was only referring to the issue with |
Ah ok, phew!
…On Wednesday, November 30, 2016, jakirkham ***@***.***> wrote:
Oh no sorry. 😳 That was vague. Was only referring to the issue with
Ellipsis.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<https://github.com/alimanfoo/zarr/issues/93#issuecomment-263991834>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAq8QtBi0IIwd4Ka3Fa7YZZ_3nnP4QSoks5rDeIjgaJpZM4LAlyc>
.
--
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Email: [email protected]
Web: http://purl.org/net/aliman
Twitter: https://twitter.com/alimanfoo
Tel: +44 (0)1865 287721
|
So I have a library, There are two functions in there for cleaning up slices as you are doing here. One for a single slice/length and one for a tuple of slices/shape. While it doesn't yet, it would be pretty trivial to extend it to handle I don't know if that is something worth considering for you or not. Just figured I'd share in any event. If so, we can discuss what the best way to do this might be. |
Thanks, that sounds very useful. Handling all the possible combinations of
arguments within Array.__getitem__() is tricky and some common code to
normalise numpy-array-like indexing arguments could be really helpful.
BTW I was also thinking in the next zarr release to add support for
indexing with a boolean array or array of integer indices (provided no
duplicates and strictly increasing) for a single dimension, I think it
shouldn't be too hard to implement (#78). I.e., subset of numpy-style fancy
indexing. So whatever normalisation that gets done on __getitem__ arguments
would need to accommodate this at some point too.
…On Thursday, December 1, 2016, jakirkham ***@***.***> wrote:
So I have a library, kenjutsu <https://github.com/jakirkham/kenjutsu>,
that I refactored out of some code recently, whose sole purpose is to work
with slices. It's pure Python, works with 2 and 3, has no dependencies, and
is on conda-forge.
There are two functions in there for cleaning up slices as you are doing
here. One for a single slice/length and one for a tuple of slices/shape.
While it doesn't yet, it would be pretty trivial to extend it to handle
Ellipsis and integer index or indices. Would likely handle this on my end
anyways so as to skirt around edge cases like this one.
I don't know if that is something worth considering for you or not. Just
figured I'd share in any event. If so, we can discuss what the best way to
do this might be.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<https://github.com/alimanfoo/zarr/issues/93#issuecomment-264081875>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAq8QuDg4zrp-HDpHDOdEqcht0-5P4Uvks5rDlnogaJpZM4LAlyc>
.
--
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Email: [email protected]
Web: http://purl.org/net/aliman
Twitter: https://twitter.com/alimanfoo
Tel: +44 (0)1865 287721
|
Definitely. Part of the initial motivation for that code was also to determine the shape of the array needed to hold such a slice. So that part may be useful here too. Also FYI have an Yeah, I was looking at that case ( https://github.com/alimanfoo/zarr/issues/78 ). I'll post some thoughts over there and we can discuss. |
FYI made a release of |
Thanks a lot, I have to work on something else this week but will take a
look ASAP.
…On Mon, 5 Dec 2016 at 20:52, jakirkham ***@***.***> wrote:
FYI made a release of kenjutsu version 0.2.0, which solves this
particular case and a few others. It should help cutdown some of the slice
logic in zarr if you would be interested in trying.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<https://github.com/alimanfoo/zarr/issues/93#issuecomment-264973705>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAq8QusDnIrKmQ9Wwd3X069k0DyCn4A8ks5rFHmlgaJpZM4LAlyc>
.
|
PR ( https://github.com/alimanfoo/zarr/pull/116 ) should fix this issue. |
Running into some cases where parsing an
Ellipsis
does not work. Note that Input 5 where anEllipsis
is used with an index doesn't work.Side note: Seems there is something funky with floating point round off error too (e.g. Input 3 and Input 4 seem to differ).
The text was updated successfully, but these errors were encountered: