-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
MCVE Code Sample
import numpy as np
import pandas as pd
import xarray as xr
np.random.seed(123)
times = pd.date_range("2000-01-01", "2001-12-31", name="time")
annual_cycle = np.sin(2 * np.pi * (times.dayofyear.values / 365.25 - 0.28))
base = 10 + 15 * annual_cycle.reshape(-1, 1)
tmin_values = base + 3 * np.random.randn(annual_cycle.size, 3)
tmax_values = base + 10 + 3 * np.random.randn(annual_cycle.size, 3)
ds = xr.Dataset({"tmin": (("time", "location"), tmin_values),
"tmax": (("time", "location"), tmax_values),},
{"time": times, "location": ["IA", "IN", "IL"]})
# Assign an attribute
ds = ds.assign_attrs(CRS = 'EPSG:4326')
# Change dtype
ds.astype(np.float32)
Expected Output
ds
to be returned with variables of dtype
np.float32
, with attributes (e.g. CRS = 'EPSG:4326'
) still included in the dataset.
Problem Description
On xarray
version 0.12.1, changing the dtype
of a dataset preserves any attached attributes, e.g:
<xarray.Dataset>
Dimensions: (location: 3, time: 731)
Coordinates:
* location (location) <U2 'IA' 'IN' 'IL'
* time (time) datetime64[ns] 2000-01-01 2000-01-02 ... 2001-12-31
Data variables:
tmin (time, location) float32 -8.03737 -1.7884412 ... -4.543927
tmax (time, location) float32 12.980549 3.3104093 ... 3.8052793
Attributes:
CRS: EPSG:4326
However, on xarray
version 0.13.0, changing the dtype
of a dataset silently drops any attached attributes, e.g:
<xarray.Dataset>
Dimensions: (location: 3, time: 731)
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-02 ... 2001-12-31
* location (location) <U2 'IA' 'IN' 'IL'
Data variables:
tmin (time, location) float32 -8.03737 -1.7884412 ... -4.543927
tmax (time, location) float32 12.980549 3.3104093 ... 3.8052793
This causes issues with large geospatial analyses (e.g. OpenDataCube workflows), as we need to change dtype to reduce memory, but also preserve CRS information that is used for downstream tools.
Output of xr.show_versions()
xarray: 0.13.0
pandas: 0.24.2
numpy: 1.16.2
scipy: 1.3.1
netCDF4: 1.3.1
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.0.24
cfgrib: None
iris: None
bottleneck: None
dask: 2.3.0
distributed: 2.3.2
matplotlib: 3.1.1
cartopy: 0.17.0
seaborn: None
numbagg: None
setuptools: 40.6.3
pip: 19.2.3
conda: None
pytest: 3.5.0
IPython: 7.8.0
sphinx: None