Skip to content

Commit 2b6e45a

Browse files
Add support for fancy indexing on get/setitem (#725)
* Fall back on .vindex when basic indexing fails Addresses #657 This matches NumPy behaviour in that basic, boolean, and vectorized integer (fancy) indexing are all accessible from `__{get,set}item__`. Users still have access to all the indexing methods if they want to be sure to use only basic indexing (integer + slices). * Fix basic selection test now with no IndexError * Fix basic_selection_2d test with no vindex error * Add specific test for fancy indexing fallback * Update get/setitem docstrings * Update tutorial.rst * PEP8 fix * Rename test array to z as in other tests * Add release note * Avoid mixing slicing and array indexing in setitem * Actually test for fancy index rather than try/except * Add check for 1D fancy index (no tuple) * Add tests for implicit fancy indexing, and getitem * Add expected blank line * Add strict test for make_slice_selection * Ensure make_slice_selection returns valid NumPy slices * Make pytest verbose to see what is failing in windows * Add 5 min per-test timeout * Use private self._shape when determining ndim self.shape is a property that hides a lot of computation, and, more importantly, it can be waiting for an update and so .ndim *cannot* be accessed during a reshape/append. See: zarr-developers/zarr-python#725 (comment) This should prevent that behavior. Co-authored-by: Josh Moore <[email protected]>
1 parent 3c491eb commit 2b6e45a

File tree

7 files changed

+196
-20
lines changed

7 files changed

+196
-20
lines changed

.github/workflows/windows-testing.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ jobs:
4848
conda activate zarr-env
4949
mkdir ~/blob_emulator
5050
azurite -l ~/blob_emulator --debug debug.log 2>&1 > stdouterr.log &
51-
pytest
51+
pytest -sv --timeout=300
5252
env:
5353
ZARR_TEST_ABS: 1
5454
- name: Conda info

docs/release.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,12 @@ Release notes
66
Unreleased
77
----------
88

9+
Enhancements
10+
~~~~~~~~~~~~
11+
12+
* array indexing with [] (getitem and setitem) now supports fancy indexing.
13+
By :user:`Juan Nunez-Iglesias <jni>`; :issue:`725`.
14+
915
.. _release_2.10.2:
1016

1117
2.10.2

docs/tutorial.rst

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -509,7 +509,7 @@ e.g.::
509509
[10, 11, 12, -2, 14]])
510510

511511
For convenience, coordinate indexing is also available via the ``vindex``
512-
property, e.g.::
512+
property, as well as the square bracket operator, e.g.::
513513

514514
>>> z.vindex[[0, 2], [1, 3]]
515515
array([-1, -2])
@@ -518,6 +518,16 @@ property, e.g.::
518518
array([[ 0, -3, 2, 3, 4],
519519
[ 5, 6, 7, 8, 9],
520520
[10, 11, 12, -4, 14]])
521+
>>> z[[0, 2], [1, 3]]
522+
array([-3, -4])
523+
524+
When the indexing arrays have different shapes, they are broadcast together.
525+
That is, the following two calls are equivalent::
526+
527+
>>> z[1, [1, 3]]
528+
array([5, 7])
529+
>>> z[[1, 1], [1, 3]]
530+
array([5, 7])
521531

522532
Indexing with a mask array
523533
~~~~~~~~~~~~~~~~~~~~~~~~~~

requirements_dev_optional.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ coverage
1717
flake8==3.9.2
1818
pytest-cov==2.12.1
1919
pytest-doctestplus==0.11.0
20+
pytest-timeout==1.4.2
2021
h5py==3.4.0
2122
fsspec[s3]==2021.10.0
2223
moto[server]>=1.3.14

zarr/core.py

Lines changed: 40 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
ensure_tuple,
2626
err_too_many_indices,
2727
is_contiguous_selection,
28+
is_pure_fancy_indexing,
2829
is_scalar,
2930
pop_fields,
3031
)
@@ -351,7 +352,7 @@ def attrs(self):
351352
@property
352353
def ndim(self):
353354
"""Number of dimensions."""
354-
return len(self.shape)
355+
return len(self._shape)
355356

356357
@property
357358
def _size(self):
@@ -658,8 +659,20 @@ def __getitem__(self, selection):
658659
Slices with step > 1 are supported, but slices with negative step are not.
659660
660661
Currently the implementation for __getitem__ is provided by
661-
:func:`get_basic_selection`. For advanced ("fancy") indexing, see the methods
662-
listed under See Also.
662+
:func:`vindex` if the indexing is pure fancy indexing (ie a
663+
broadcast-compatible tuple of integer array indices), or by
664+
:func:`set_basic_selection` otherwise.
665+
666+
Effectively, this means that the following indexing modes are supported:
667+
668+
- integer indexing
669+
- slice indexing
670+
- mixed slice and integer indexing
671+
- boolean indexing
672+
- fancy indexing (vectorized list of integers)
673+
674+
For specific indexing options including outer indexing, see the
675+
methods listed under See Also.
663676
664677
See Also
665678
--------
@@ -668,9 +681,12 @@ def __getitem__(self, selection):
668681
set_orthogonal_selection, vindex, oindex, __setitem__
669682
670683
"""
671-
672-
fields, selection = pop_fields(selection)
673-
return self.get_basic_selection(selection, fields=fields)
684+
fields, pure_selection = pop_fields(selection)
685+
if is_pure_fancy_indexing(pure_selection, self.ndim):
686+
result = self.vindex[selection]
687+
else:
688+
result = self.get_basic_selection(pure_selection, fields=fields)
689+
return result
674690

675691
def get_basic_selection(self, selection=Ellipsis, out=None, fields=None):
676692
"""Retrieve data for an item or region of the array.
@@ -1208,8 +1224,19 @@ def __setitem__(self, selection, value):
12081224
Slices with step > 1 are supported, but slices with negative step are not.
12091225
12101226
Currently the implementation for __setitem__ is provided by
1211-
:func:`set_basic_selection`, which means that only integers and slices are
1212-
supported within the selection. For advanced ("fancy") indexing, see the
1227+
:func:`vindex` if the indexing is pure fancy indexing (ie a
1228+
broadcast-compatible tuple of integer array indices), or by
1229+
:func:`set_basic_selection` otherwise.
1230+
1231+
Effectively, this means that the following indexing modes are supported:
1232+
1233+
- integer indexing
1234+
- slice indexing
1235+
- mixed slice and integer indexing
1236+
- boolean indexing
1237+
- fancy indexing (vectorized list of integers)
1238+
1239+
For specific indexing options including outer indexing, see the
12131240
methods listed under See Also.
12141241
12151242
See Also
@@ -1219,9 +1246,11 @@ def __setitem__(self, selection, value):
12191246
set_orthogonal_selection, vindex, oindex, __getitem__
12201247
12211248
"""
1222-
1223-
fields, selection = pop_fields(selection)
1224-
self.set_basic_selection(selection, value, fields=fields)
1249+
fields, pure_selection = pop_fields(selection)
1250+
if is_pure_fancy_indexing(pure_selection, self.ndim):
1251+
self.vindex[selection] = value
1252+
else:
1253+
self.set_basic_selection(pure_selection, value, fields=fields)
12251254

12261255
def set_basic_selection(self, selection, value, fields=None):
12271256
"""Modify data for an item or region of the array.

zarr/indexing.py

Lines changed: 63 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,23 @@
1616

1717

1818
def is_integer(x):
19+
"""True if x is an integer (both pure Python or NumPy).
20+
21+
Note that Python's bool is considered an integer too.
22+
"""
1923
return isinstance(x, numbers.Integral)
2024

2125

26+
def is_integer_list(x):
27+
"""True if x is a list of integers.
28+
29+
This function assumes ie *does not check* that all elements of the list
30+
have the same type. Mixed type lists will result in other errors that will
31+
bubble up anyway.
32+
"""
33+
return isinstance(x, list) and len(x) > 0 and is_integer(x[0])
34+
35+
2236
def is_integer_array(x, ndim=None):
2337
t = hasattr(x, 'shape') and hasattr(x, 'dtype') and x.dtype.kind in 'ui'
2438
if ndim is not None:
@@ -41,6 +55,49 @@ def is_scalar(value, dtype):
4155
return False
4256

4357

58+
def is_pure_fancy_indexing(selection, ndim):
59+
"""Check whether a selection contains only scalars or integer array-likes.
60+
61+
Parameters
62+
----------
63+
selection : tuple, slice, or scalar
64+
A valid selection value for indexing into arrays.
65+
66+
Returns
67+
-------
68+
is_pure : bool
69+
True if the selection is a pure fancy indexing expression (ie not mixed
70+
with boolean or slices).
71+
"""
72+
if ndim == 1:
73+
if is_integer_list(selection) or is_integer_array(selection):
74+
return True
75+
# if not, we go through the normal path below, because a 1-tuple
76+
# of integers is also allowed.
77+
no_slicing = (
78+
isinstance(selection, tuple)
79+
and len(selection) == ndim
80+
and not (
81+
any(isinstance(elem, slice) or elem is Ellipsis
82+
for elem in selection)
83+
)
84+
)
85+
return (
86+
no_slicing and
87+
all(
88+
is_integer(elem)
89+
or is_integer_list(elem)
90+
or is_integer_array(elem)
91+
for elem in selection
92+
) and
93+
any(
94+
is_integer_list(elem)
95+
or is_integer_array(elem)
96+
for elem in selection
97+
)
98+
)
99+
100+
44101
def normalize_integer_selection(dim_sel, dim_len):
45102

46103
# normalize type to int
@@ -833,10 +890,14 @@ def make_slice_selection(selection):
833890
ls = []
834891
for dim_selection in selection:
835892
if is_integer(dim_selection):
836-
ls.append(slice(dim_selection, dim_selection + 1, 1))
893+
ls.append(slice(int(dim_selection), int(dim_selection) + 1, 1))
837894
elif isinstance(dim_selection, np.ndarray):
838895
if len(dim_selection) == 1:
839-
ls.append(slice(dim_selection[0], dim_selection[0] + 1, 1))
896+
ls.append(
897+
slice(
898+
int(dim_selection[0]), int(dim_selection[0]) + 1, 1
899+
)
900+
)
840901
else:
841902
raise ArrayIndexError()
842903
else:

zarr/tests/test_indexing.py

Lines changed: 74 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44

55
import zarr
66
from zarr.indexing import (
7+
make_slice_selection,
78
normalize_integer_selection,
89
oindex,
910
oindex_set,
@@ -198,15 +199,15 @@ def test_get_basic_selection_1d():
198199
for selection in basic_selections_1d:
199200
_test_get_basic_selection(a, z, selection)
200201

201-
bad_selections = basic_selections_1d_bad + [
202-
[0, 1], # fancy indexing
203-
]
204-
for selection in bad_selections:
202+
for selection in basic_selections_1d_bad:
205203
with pytest.raises(IndexError):
206204
z.get_basic_selection(selection)
207205
with pytest.raises(IndexError):
208206
z[selection]
209207

208+
with pytest.raises(IndexError):
209+
z.get_basic_selection([1, 0])
210+
210211

211212
basic_selections_2d = [
212213
# single row
@@ -274,14 +275,75 @@ def test_get_basic_selection_2d():
274275
bad_selections = basic_selections_2d_bad + [
275276
# integer arrays
276277
[0, 1],
277-
([0, 1], [0, 1]),
278278
(slice(None), [0, 1]),
279279
]
280280
for selection in bad_selections:
281281
with pytest.raises(IndexError):
282282
z.get_basic_selection(selection)
283283
with pytest.raises(IndexError):
284284
z[selection]
285+
# check fallback on fancy indexing
286+
fancy_selection = ([0, 1], [0, 1])
287+
np.testing.assert_array_equal(z[fancy_selection], [0, 11])
288+
289+
290+
def test_fancy_indexing_fallback_on_get_setitem():
291+
z = zarr.zeros((20, 20))
292+
z[[1, 2, 3], [1, 2, 3]] = 1
293+
np.testing.assert_array_equal(
294+
z[:4, :4],
295+
[
296+
[0, 0, 0, 0],
297+
[0, 1, 0, 0],
298+
[0, 0, 1, 0],
299+
[0, 0, 0, 1],
300+
],
301+
)
302+
np.testing.assert_array_equal(
303+
z[[1, 2, 3], [1, 2, 3]], 1
304+
)
305+
# test broadcasting
306+
np.testing.assert_array_equal(
307+
z[1, [1, 2, 3]], [1, 0, 0]
308+
)
309+
# test 1D fancy indexing
310+
z2 = zarr.zeros(5)
311+
z2[[1, 2, 3]] = 1
312+
np.testing.assert_array_equal(
313+
z2, [0, 1, 1, 1, 0]
314+
)
315+
316+
317+
def test_fancy_indexing_doesnt_mix_with_slicing():
318+
z = zarr.zeros((20, 20))
319+
with pytest.raises(IndexError):
320+
z[[1, 2, 3], :] = 2
321+
with pytest.raises(IndexError):
322+
np.testing.assert_array_equal(
323+
z[[1, 2, 3], :], 0
324+
)
325+
326+
327+
def test_fancy_indexing_doesnt_mix_with_implicit_slicing():
328+
z2 = zarr.zeros((5, 5, 5))
329+
with pytest.raises(IndexError):
330+
z2[[1, 2, 3], [1, 2, 3]] = 2
331+
with pytest.raises(IndexError):
332+
np.testing.assert_array_equal(
333+
z2[[1, 2, 3], [1, 2, 3]], 0
334+
)
335+
with pytest.raises(IndexError):
336+
z2[[1, 2, 3]] = 2
337+
with pytest.raises(IndexError):
338+
np.testing.assert_array_equal(
339+
z2[[1, 2, 3]], 0
340+
)
341+
with pytest.raises(IndexError):
342+
z2[..., [1, 2, 3]] = 2
343+
with pytest.raises(IndexError):
344+
np.testing.assert_array_equal(
345+
z2[..., [1, 2, 3]], 0
346+
)
285347

286348

287349
def test_set_basic_selection_0d():
@@ -1373,3 +1435,10 @@ def test_PartialChunkIterator(selection, arr, expected):
13731435
PCI = PartialChunkIterator(selection, arr.shape)
13741436
results = list(PCI)
13751437
assert results == expected
1438+
1439+
1440+
def test_slice_selection_uints():
1441+
arr = np.arange(24).reshape((4, 6))
1442+
idx = np.uint64(3)
1443+
slice_sel = make_slice_selection((idx,))
1444+
assert arr[slice_sel].shape == (1, 6)

0 commit comments

Comments
 (0)