Skip to content

Zarr loading from ZipStore gives error on default arguments #2586

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
eeopd opened this issue Dec 2, 2018 · 5 comments
Closed

Zarr loading from ZipStore gives error on default arguments #2586

eeopd opened this issue Dec 2, 2018 · 5 comments
Labels
topic-zarr Related to zarr storage library

Comments

@eeopd
Copy link

eeopd commented Dec 2, 2018

(This is not too much of a problem, but it would probably be a reasonably easy fix)

import xarray as xr
import zarr

ds = xr.Dataset({'foo': [2,3,4], 'bar': ('x', [1, 2]), 'baz': 3.14})

ds.to_zarr(zarr.ZipStore("test.zarr"))
print(xr.open_zarr(zarr.ZipStore("test.zarr")))

This gives the following error:

ValueError                                Traceback (most recent call last)
<ipython-input-1-c68e53adfa79> in <module>
      5 
      6 ds.to_zarr(zarr.ZipStore("test.zarr"))
----> 7 print(xr.open_zarr(zarr.ZipStore("test.zarr")))

~/.local/lib/python3.7/site-packages/xarray/backends/zarr.py in open_zarr(store, group, synchronizer, auto_chunk, decode_cf, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables)
    424     zarr_store = ZarrStore.open_group(store, mode=mode,
    425                                       synchronizer=synchronizer,
--> 426                                       group=group)
    427     ds = maybe_decode_store(zarr_store)
    428 

~/.local/lib/python3.7/site-packages/xarray/backends/zarr.py in open_group(cls, store, mode, synchronizer, group)
    236                                       "#installation" % min_zarr)
    237         zarr_group = zarr.open_group(store=store, mode=mode,
--> 238                                      synchronizer=synchronizer, path=group)
    239         return cls(zarr_group)
    240 

~/.local/lib/python3.7/site-packages/zarr/hierarchy.py in open_group(store, mode, cache_attrs, synchronizer, path)
   1111             err_contains_array(path)
   1112         elif not contains_group(store, path=path):
-> 1113             err_group_not_found(path)
   1114 
   1115     elif mode == 'w':

~/.local/lib/python3.7/site-packages/zarr/errors.py in err_group_not_found(path)
     27 
     28 def err_group_not_found(path):
---> 29     raise ValueError('group not found at path %r' % path)
     30 
     31 

ValueError: group not found at path ''

Instead, one has to use

xr.open_zarr(zarr.ZipStore("test.zarr"), group='/')

When using a dictionary as store (e.g. when using ds.to_zarr('test_zarr')), this group='/' is unnecessary when loading it again, but everything still works when using it anyway. So I'd propose changing the default value of the group argument to '/', so the ZipStore (and probably also the other stores) will work by default as well.

@rabernat rabernat added the topic-zarr Related to zarr storage library label Dec 20, 2018
@jhamman
Copy link
Member

jhamman commented Jan 13, 2019

@eeopd - thanks for opening this issue. This seems like a reasonable request though I'm wondering if this is actually more a zarr problem than an xarray one. I say that because xarray's zarr backend doesn't modify the group kwarg at all, it just passes it though to zarr.open_group(..., path=group). I wonder if you would get the same error with a pure zarr workflow.

cc @rabernat, @jakirkham, @alimanfoo

@jhamman jhamman added needs release breaking changes that should be held until a major release and removed needs release breaking changes that should be held until a major release labels Jan 13, 2019
@jakirkham
Copy link

I'm not really familiar with XArray's internals, but issue ( #2660 ) looks relevant.

What happens if you do?

ds.to_zarr(zarr.group(zarr.ZipStore("test.zarr")))
print(xr.open_zarr(zarr.group(zarr.ZipStore("test.zarr"))))

@eeopd
Copy link
Author

eeopd commented Jan 17, 2019

Hmm, I can't reproduce my issue anymore, it now works by default. I have no idea what fixed it. I suppose that means this issue can be closed :)

@jakirkham, your suggestion does not work: ds.to_zarr only seems to accept zarr datastores or a path (in that case the data is stored in a zarr.DirectoryStore)
(when I try it, I get the following error:)

TypeError                               Traceback (most recent call last)
<ipython-input-13-bd9b742d6a11> in <module>
----> 1 ds.to_zarr(zarr.group(zarr.ZipStore("test.zarr")))

~/.local/lib/python3.7/site-packages/xarray/core/dataset.py in to_zarr(self, store, mode, synchronizer, group, encoding, compute)
   1257         from ..backends.api import to_zarr
   1258         return to_zarr(self, store=store, mode=mode, synchronizer=synchronizer,
-> 1259                        group=group, encoding=encoding, compute=compute)
   1260 
   1261     def __unicode__(self):

~/.local/lib/python3.7/site-packages/xarray/backends/api.py in to_zarr(dataset, store, mode, synchronizer, group, encoding, compute)
    879     store = backends.ZarrStore.open_group(store=store, mode=mode,
    880                                           synchronizer=synchronizer,
--> 881                                           group=group)
    882 
    883     writer = ArrayWriter()

~/.local/lib/python3.7/site-packages/xarray/backends/zarr.py in open_group(cls, store, mode, synchronizer, group)
    236                                       "#installation" % min_zarr)
    237         zarr_group = zarr.open_group(store=store, mode=mode,
--> 238                                      synchronizer=synchronizer, path=group)
    239         return cls(zarr_group)
    240 

~/.local/lib/python3.7/site-packages/zarr/hierarchy.py in open_group(store, mode, cache_attrs, synchronizer, path)
   1134 
   1135     return Group(store, read_only=read_only, cache_attrs=cache_attrs,
-> 1136                  synchronizer=synchronizer, path=path)

~/.local/lib/python3.7/site-packages/zarr/hierarchy.py in __init__(self, store, path, read_only, chunk_store, cache_attrs, synchronizer)
    114             err_group_not_found(path)
    115         else:
--> 116             meta = decode_group_metadata(meta_bytes)
    117             self._meta = meta
    118 

~/.local/lib/python3.7/site-packages/zarr/meta.py in decode_group_metadata(s)
     98 def decode_group_metadata(s):
     99     s = _ensure_str(s)
--> 100     meta = json.loads(s)
    101     zarr_format = meta.get('zarr_format', None)
    102     if zarr_format != ZARR_FORMAT:

/usr/lib/python3.7/json/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    339     else:
    340         if not isinstance(s, (bytes, bytearray)):
--> 341             raise TypeError(f'the JSON object must be str, bytes or bytearray, '
    342                             f'not {s.__class__.__name__}')
    343         s = s.decode(detect_encoding(s), 'surrogatepass')

TypeError: the JSON object must be str, bytes or bytearray, not Array```

@eeopd eeopd closed this as completed Jan 17, 2019
@rabernat
Copy link
Contributor

IMO, zarr needs some kind of "resolver" mechanism that takes a string and decides what kind of store it represents. For example, if the path ends with .zip, then it should know it's zip store, if it starts with gs://, it should know it's a google cloud store, etc.

@alimanfoo
Copy link
Contributor

IMO, zarr needs some kind of "resolver" mechanism that takes a string and decides what kind of store it represents. For example, if the path ends with .zip, then it should know it's zip store, if it starts with gs://, it should know it's a google cloud store, etc.

Some very limited support for this is there already, e.g., if string ends with '.zip' then a zip store will be used, but there's no support for dispatching to cloud stores via a URL-like protocol. There's an open issue for that: zarr-developers/zarr-python#214

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-zarr Related to zarr storage library
Projects
None yet
Development

No branches or pull requests

5 participants