WIP: Zarr chunks refactor #4550

aurghs · 2020-10-29T14:44:31Z

This work aims to harmonize the way zarr deals with chunking to have similar behavior for all other backends and unify the code.
Most of the changes involve the new API, apiv2.py, except for some changes in the code that has been added with the merge of #4187.

main changes:

refactor apiv2.dataset_from_backend_dataset function.
move get_chunks from zarr to dataset.

current status:

in apiv2.open_dataset chunks='auto' and chunks={} now has the same beahviuor
in apiv2.open_dataset for all the backends now the default chunking is provided by the backend, if it is not available it uses one big chunk.

Missing points:

standardize the key in encodings to define the on-disk chunks: chunksizes
add a specific key in encodings for preferred chunking (currently it is used chunks)

There is one open point to be discussed yet: dataset.chunks and open_dataset(..., chunks=...) have different behaviors.
dataset.chunks(chunks={}) opens the dataset with only one chunk per variable, while in open_dataset(..., chunks={}) it uses encodings['chunks'], when available.

Note that also chunks=None has a different behaviour: open_dataset(..., chunks=None) (or open_dataset(...), it's the deafult) returns variables without chunks, while dataset.chunk(chunks=None) (or dataset.chunk(), it's the default) has the same behavior of dataset.chunk(chunks=None). Probably it's not worth changing it.

related to Flexible backends - Harmonise zarr chunking with other backends chunking #4496
Tests added
Passes isort . && black . && mypy . && flake8
User visible changes (including notable bug fixes) are documented in whats-new.rst
New functions/methods are listed in api.rst

Add necessary imports for this function.

# Conflicts: # xarray/backends/api.py

…ad-refactor # Conflicts: # xarray/backends/apiv2.py

- to be used in apiv2 without instantiate the object

- modify signature - move default setting inside backends

…2.dataset_from_backend_dataset`

…lated error message

# Conflicts: # xarray/backends/apiv2.py

…arr-chunks-refactor � Conflicts: � xarray/backends/apiv2.py � xarray/core/dataset.py

re-add check on chunks type

…arr-chunks-refactor � Conflicts: � xarray/backends/apiv2.py

pep8speaks · 2020-10-29T14:44:44Z

Hello @aurghs! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file xarray/backends/zarr.py:

Line 1:1: F401 'warnings' imported but unused

Comment last updated at 2020-11-02 12:07:00 UTC

aurghs and others added 30 commits September 25, 2020 19:07

add in api.open_dataset dispatching to stub apiv2

f961606

remove in apiv2 check for input AbstractDataStore

fb166fa

bugfix typo

0221eec

add kwarg engines in _get_backend_cls needed by apiv2

36a02c7

add alpha support for h5netcdf

cfb8cb8

style: clean not used code, modify some variable/function name

4256bc8

Add ENGINES entry for cfgrib.

1bc7391

Define function open_backend_dataset_cfgrib() to be used in apiv2.py.

748fe5a

Add necessary imports for this function.

Apply black to check formatting.

fb368fe

Apply black to check formatting.

80e111c

add dummy zarr apiv2 backend

e15ca6b

Merge branch 'master' into backend-read-refactor

025cc87

# Conflicts: # xarray/backends/api.py

align apiv2.open_dataset to api.open_dataset

4b19399

remove unused extra_coords in open_backend_dataset_*

572595f

Merge remote-tracking branch 'origin/cfgrib_refactor' into backend-re…

d6e632e

…ad-refactor # Conflicts: # xarray/backends/apiv2.py

remove extra_coords in open_backend_dataset_cfgrib

74aba14

transform zarr maybe_chunk and get_chunks in classmethod

d6280ec

- to be used in apiv2 without instantiate the object

make alpha zarr apiv2 working

c0e0f34

refactor apiv2.open_dataset:

6431101

- modify signature - move default setting inside backends

move dataset_from_backend_dataset out of apiv2.open_dataset

50d1ebe

remove blank lines

383d323

remove blank lines

457a09c

style

2803fe3

Re-write error messages

08db0bd

Fix code style

1f11845

Fix code style

93303b1

remove unused import

bc2fe00

zarr chunking refactor draft not working

102b00a

refactor dataset_from_backend_dataset

f47605a

fix wrong commit

b632b05

aurghs added 19 commits October 2, 2020 14:19

add get_chunk in apiv2

b437f02

replace warning with ValueError for not supported kwargs in backends

d694146

change zarr.ZarStore.get_chunks into a static method

56f4d3f

group backend_kwargs and kwargs in extra_tokes argument in apiv…

df23b18

…2.dataset_from_backend_dataset`

remove in open_backend_dayaset_${engine} signature kwarags and the re…

a04e6ac

…lated error message

black

de29a4c

Merge branch 'backend-read-refactor' into zarr-chunks-refactor

3b896f2

# Conflicts: # xarray/backends/apiv2.py

remove not used apiv2.set_source

cf77dc3

remove auto as chunk value

c1b763a

Merge branch 'master' into zarr-chunks-refactor

c32c62c

# Conflicts: # xarray/backends/apiv2.py

- align with api.py

5eb0daf

unify backends chunking

4fc1d8d

move get_chunk funtion in dataset

1cf6968

move get_chunk funtion in dataset

a780efc

Merge branch 'zarr-chunks-refactor' of github.com:bopen/xarray into z…

18e0077

…arr-chunks-refactor � Conflicts: � xarray/backends/apiv2.py � xarray/core/dataset.py

black

93cadee

remove unused import

6e9a562

re-add check on chunks type

remove unused import

69c2790

re-add check on chunks type

Merge branch 'zarr-chunks-refactor' of github.com:bopen/xarray into z…

0616a08

…arr-chunks-refactor � Conflicts: � xarray/backends/apiv2.py

TheRed86 added 3 commits November 2, 2020 12:49

Pass isort test.

a1cfa29

Pass black -t py36 test.

c590e05

Remove duplicated functions.

c6d341c

aurghs closed this Nov 10, 2020

alexamici added topic-backends grant-czi labels Dec 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

WIP: Zarr chunks refactor #4550

WIP: Zarr chunks refactor #4550

Uh oh!

aurghs commented Oct 29, 2020

Uh oh!

pep8speaks commented Oct 29, 2020 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

WIP: Zarr chunks refactor #4550

WIP: Zarr chunks refactor #4550

Uh oh!

Conversation

aurghs commented Oct 29, 2020

Uh oh!

pep8speaks commented Oct 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2020-11-02 12:07:00 UTC

Uh oh!

Uh oh!

pep8speaks commented Oct 29, 2020 •

edited

Loading