1
- user-guide-performance
1
+ .. only :: doctest
2
+
3
+ >>> import shutil
4
+ >>> shutil.rmtree(' data' , ignore_errors = True )
5
+
6
+ .. _user-guide-performance :
2
7
3
8
Optimizing performance
4
9
======================
@@ -19,42 +24,41 @@ better performance, at least when using the Blosc compression library.
19
24
The optimal chunk shape will depend on how you want to access the data. E.g.,
20
25
for a 2-dimensional array, if you only ever take slices along the first
21
26
dimension, then chunk across the second dimension. If you know you want to chunk
22
- across an entire dimension you can use `` None `` or `` -1 `` within the `` chunks ``
23
- argument, e.g.::
27
+ across an entire dimension you can use the full size of that dimension within the
28
+ `` chunks `` argument, e.g.::
24
29
25
30
>>> import zarr
26
- >>>
27
- >>> z1 = zarr.zeros((10000, 10000), chunks=(100, None), dtype='i4')
31
+ >>> z1 = zarr.create_array(store={}, shape=(10000, 10000), chunks=(100, 10000), dtype='int32')
28
32
>>> z1.chunks
29
33
(100, 10000)
30
34
31
35
Alternatively, if you only ever take slices along the second dimension, then
32
36
chunk across the first dimension, e.g.::
33
37
34
- >>> z2 = zarr.zeros( (10000, 10000), chunks=(None , 100), dtype='i4 ')
38
+ >>> z2 = zarr.create_array(store={}, shape= (10000, 10000), chunks=(10000 , 100), dtype='int32 ')
35
39
>>> z2.chunks
36
40
(10000, 100)
37
41
38
42
If you require reasonable performance for both access patterns then you need to
39
43
find a compromise, e.g.::
40
44
41
- >>> z3 = zarr.zeros( (10000, 10000), chunks=(1000, 1000), dtype='i4 ')
45
+ >>> z3 = zarr.create_array(store={}, shape= (10000, 10000), chunks=(1000, 1000), dtype='int32 ')
42
46
>>> z3.chunks
43
47
(1000, 1000)
44
48
45
49
If you are feeling lazy, you can let Zarr guess a chunk shape for your data by
46
- providing ``chunks=True ``, although please note that the algorithm for guessing
50
+ providing ``chunks='auto' ``, although please note that the algorithm for guessing
47
51
a chunk shape is based on simple heuristics and may be far from optimal. E.g.::
48
52
49
- >>> z4 = zarr.zeros( (10000, 10000), chunks=True , dtype='i4 ')
53
+ >>> z4 = zarr.create_array(store={}, shape= (10000, 10000), chunks='auto' , dtype='int32 ')
50
54
>>> z4.chunks
51
55
(625, 625)
52
56
53
57
If you know you are always going to be loading the entire array into memory, you
54
- can turn off chunks by providing ``chunks=False `` , in which case there will be
55
- one single chunk for the array::
58
+ can turn off chunks by providing ``chunks `` equal to `` shape `` , in which case there
59
+ will be one single chunk for the array::
56
60
57
- >>> z5 = zarr.zeros( (10000, 10000), chunks=False, dtype='i4 ')
61
+ >>> z5 = zarr.create_array(store={}, shape= (10000, 10000), chunks=(10000, 10000), dtype='int32 ')
58
62
>>> z5.chunks
59
63
(10000, 10000)
60
64
@@ -70,9 +74,9 @@ ratios, depending on the correlation structure within the data. E.g.::
70
74
71
75
>>> import numpy as np
72
76
>>>
73
- >>> a = np.arange(100000000, dtype='i4 ').reshape(10000, 10000).T
74
- >>> # TODO: replace with create_array after #2463
75
- >>> c = zarr.array(a, chunks=(1000, 1000))
77
+ >>> a = np.arange(100000000, dtype='int32 ').reshape(10000, 10000).T
78
+ >>> c = zarr.create_array(store={}, shape=a.shape, chunks=(1000, 1000), dtype=a.dtype, config={'order': 'C'})
79
+ >>> c[:] = a
76
80
>>> c.info_complete()
77
81
Type : Array
78
82
Zarr format : 3
@@ -88,7 +92,8 @@ ratios, depending on the correlation structure within the data. E.g.::
88
92
Storage ratio : 1.2
89
93
Chunks Initialized : 100
90
94
>>> with zarr.config.set({'array.order': 'F'}):
91
- ... f = zarr.array(a, chunks=(1000, 1000))
95
+ ... f = zarr.create_array(store={}, shape=a.shape, chunks=(1000, 1000), dtype=a.dtype)
96
+ ... f[:] = a
92
97
>>> f.info_complete()
93
98
Type : Array
94
99
Zarr format : 3
@@ -143,15 +148,14 @@ the time required to write an array with different values.::
143
148
... shape = (chunks[0] * 1024,)
144
149
... data = np.random.randint(0, 255, shape)
145
150
... dtype = 'uint8'
146
- ... with zarr.config.set({"array.write_empty_chunks": write_empty_chunks}):
147
- ... arr = zarr.open(
148
- ... f"data/example-{write_empty_chunks}.zarr",
149
- ... shape=shape,
150
- ... chunks=chunks,
151
- ... dtype=dtype,
152
- ... fill_value=0,
153
- ... mode='w'
154
- ... )
151
+ ... arr = zarr.create_array(
152
+ ... f'data/example-{write_empty_chunks}.zarr',
153
+ ... shape=shape,
154
+ ... chunks=chunks,
155
+ ... dtype=dtype,
156
+ ... fill_value=0,
157
+ ... config={'write_empty_chunks': write_empty_chunks}
158
+ ... )
155
159
... # initialize all chunks
156
160
... arr[:] = 100
157
161
... result = []
@@ -208,9 +212,9 @@ to re-open any underlying files or databases upon being unpickled.
208
212
E.g., pickle/unpickle an local store array::
209
213
210
214
>>> import pickle
211
- >>>
212
- >>> # TODO: replace with create_array after #2463
213
- >>> z1 = zarr.array(store=" data/example-2", data=np.arange(100000))
215
+ >>> data = np.arange(100000)
216
+ >>> z1 = zarr. create_array(store='data/example-2.zarr', shape=data.shape, chunks=data.shape, dtype=data.dtype)
217
+ >>> z1[:] = data
214
218
>>> s = pickle.dumps(z1)
215
219
>>> z2 = pickle.loads(s)
216
220
>>> z1 == z2
0 commit comments