You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to bring up an issue that has tripped up my workflow with large climate models many times. I am dealing with large data arrays of vertical cell thickness. These are 4d arrays (x, y, z, time) but I would define them as coordinates, not data_variables in the xarrays data model (e.g. they should not be multiplied by a value if a dataset is multiplied).
These sort of coordinates might become more prevalent with newer ocean models like MOM6
Whenever I assign these arrays as coordinates operations on the arrays seem to trigger computation, whereas they don't if I set them up as data_variables. The example below shows this behavior.
Is this a bug or done on purpose? Is there a workaround to keep these vertical thicknesses as coordinates?
import xarray as xr
import numpy as np
import dask.array as dsa
# create dataset with with vertical thickness `dz` as data variable
data = xr.DataArray(dsa.random.random([30, 50, 200, 1000]), dims=['x','y', 'z', 't'])
dz = xr.DataArray(dsa.random.random([30, 50, 200, 1000]), dims=['x','y', 'z', 't'])
ds = xr.Dataset({'data':data, 'dz':dz})
#another dataset with `dz` as coordinate
ds_new = xr.Dataset({'data':data})
ds_new.coords['dz'] = dz
%%time
test = ds['data'] * ds['dz']
CPU times: user 1.94 ms, sys: 19.1 ms, total: 21 ms Wall time: 21.6 ms
%%time
test = ds_new['data'] * ds_new['dz']
CPU times: user 17.4 s, sys: 1.98 s, total: 19.4 s Wall time: 12.5 s
You guys are the best! Thanks.
Julius J.M. Busecke, Ph.D. (he/him)
———————————————
Postdoctoral Research Associate
Princeton University • Geosciences
408A Guyot Hall • Princeton NJ
juliusbusecke.com
On Oct 29, 2019, at 10:47 AM, Deepak Cherian <[email protected]> wrote:
Totally fixed by #3453<#3453>!!!
Both statements take the same time on that branch.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#3454?email_source=notifications&email_token=ADNGY72OB7JSHN4OALG7RALQRBEGXA5CNFSM4JGIQ3AKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECQYRBQ#issuecomment-547457158>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADNGY7ZRON5TWT4GMB4HEHLQRBEGXANCNFSM4JGIQ3AA>.
I want to bring up an issue that has tripped up my workflow with large climate models many times. I am dealing with large data arrays of vertical cell thickness. These are 4d arrays (x, y, z, time) but I would define them as coordinates, not data_variables in the xarrays data model (e.g. they should not be multiplied by a value if a dataset is multiplied).
These sort of coordinates might become more prevalent with newer ocean models like MOM6
Whenever I assign these arrays as coordinates operations on the arrays seem to trigger computation, whereas they don't if I set them up as data_variables. The example below shows this behavior.
Is this a bug or done on purpose? Is there a workaround to keep these vertical thicknesses as coordinates?
CPU times: user 1.94 ms, sys: 19.1 ms, total: 21 ms Wall time: 21.6 ms
CPU times: user 17.4 s, sys: 1.98 s, total: 19.4 s Wall time: 12.5 s
Output of
xr.show_versions()
xarray: 0.13.0+24.g4254b4af
pandas: 0.25.1
numpy: 1.17.2
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.5.0
distributed: 2.5.1
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
setuptools: 41.2.0
pip: 19.2.3
conda: None
pytest: 5.2.0
IPython: 7.8.0
sphinx: None
The text was updated successfully, but these errors were encountered: