remove xarray monkey patches in connectivity generation script #101

chrisbarber · 2018-07-26T17:27:33Z

At some point it should be possible to remove

datacube/scripts/datasets/connectivity.py

Lines 616 to 625 in af79f76

    
           # monkey patch to get zarr to ignore dask chunks and use its own heuristics 
        
           def copy_func(f): 
        
               g = types.FunctionType(f.__code__, f.__globals__, name=f.__name__, 
        
                                      argdefs=f.__defaults__, 
        
                                      closure=f.__closure__) 
        
               g = functools.update_wrapper(g, f) 
        
               g.__kwdefaults__ = f.__kwdefaults__ 
        
               return g 
        
           orig_determine_zarr_chunks = copy_func(xr.backends.zarr._determine_zarr_chunks) 
        
           xr.backends.zarr._determine_zarr_chunks = lambda enc_chunks, var_chunks, ndim: orig_determine_zarr_chunks(enc_chunks, None, ndim)

and

datacube/scripts/datasets/connectivity.py

Lines 632 to 645 in af79f76

    
           # monkey patch to make dask arrays writable with different chunks than zarr dest 
        
           # could do without this but would have to contend with 'inconsistent chunks' on dataset 
        
           def sync_using_zarr_copy(self, compute=True): 
        
               if self.sources: 
        
                   import dask.array as da 
        
                   rechunked_sources = [source.rechunk(target.chunks) 
        
                       for source, target in zip(self.sources, self.targets)] 
        
                   delayed_store = da.store(rechunked_sources, self.targets, 
        
                                            lock=self.lock, compute=compute, 
        
                                            flush=True) 
        
                   self.sources = [] 
        
                   self.targets = [] 
        
                   return delayed_store 
        
           xr.backends.common.ArrayWriter.sync = sync_using_zarr_copy

'chunks': None here may need to be removed at the same time.

Xarray 0.10.8 has made some improvement in fixing this, but there are still some issues (see pydata/xarray#2300). This needs to be carefully tested for performance since some chunk configurations can cause a large amount of data to be loaded into memory during certain queries. This might be another bug; more investigation is needed.

The text was updated successfully, but these errors were encountered:

chrisbarber pushed a commit that referenced this issue Jul 26, 2018

initial (#101)

4a551b3

chrisbarber pushed a commit that referenced this issue Dec 11, 2018

reference monkey-patch issues in the code (#101)

fcec545

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

remove xarray monkey patches in connectivity generation script #101

remove xarray monkey patches in connectivity generation script #101

chrisbarber commented Jul 26, 2018

remove xarray monkey patches in connectivity generation script #101

remove xarray monkey patches in connectivity generation script #101

Comments

chrisbarber commented Jul 26, 2018