Skip to content

remove xarray monkey patches in connectivity generation script #101

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
chrisbarber opened this issue Jul 26, 2018 · 0 comments
Open

remove xarray monkey patches in connectivity generation script #101

chrisbarber opened this issue Jul 26, 2018 · 0 comments

Comments

@chrisbarber
Copy link
Contributor

At some point it should be possible to remove

# monkey patch to get zarr to ignore dask chunks and use its own heuristics
def copy_func(f):
g = types.FunctionType(f.__code__, f.__globals__, name=f.__name__,
argdefs=f.__defaults__,
closure=f.__closure__)
g = functools.update_wrapper(g, f)
g.__kwdefaults__ = f.__kwdefaults__
return g
orig_determine_zarr_chunks = copy_func(xr.backends.zarr._determine_zarr_chunks)
xr.backends.zarr._determine_zarr_chunks = lambda enc_chunks, var_chunks, ndim: orig_determine_zarr_chunks(enc_chunks, None, ndim)

and

# monkey patch to make dask arrays writable with different chunks than zarr dest
# could do without this but would have to contend with 'inconsistent chunks' on dataset
def sync_using_zarr_copy(self, compute=True):
if self.sources:
import dask.array as da
rechunked_sources = [source.rechunk(target.chunks)
for source, target in zip(self.sources, self.targets)]
delayed_store = da.store(rechunked_sources, self.targets,
lock=self.lock, compute=compute,
flush=True)
self.sources = []
self.targets = []
return delayed_store
xr.backends.common.ArrayWriter.sync = sync_using_zarr_copy

'chunks': None here may need to be removed at the same time.

Xarray 0.10.8 has made some improvement in fixing this, but there are still some issues (see pydata/xarray#2300). This needs to be carefully tested for performance since some chunk configurations can cause a large amount of data to be loaded into memory during certain queries. This might be another bug; more investigation is needed.

chrisbarber pushed a commit that referenced this issue Jul 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant