Skip to content

Changing dtype on v0.13.0 causes Dataset attributes to be lost #3348

@robbibt

Description

@robbibt

MCVE Code Sample

import numpy as np
import pandas as pd
import xarray as xr

np.random.seed(123)

times = pd.date_range("2000-01-01", "2001-12-31", name="time")
annual_cycle = np.sin(2 * np.pi * (times.dayofyear.values / 365.25 - 0.28))

base = 10 + 15 * annual_cycle.reshape(-1, 1)
tmin_values = base + 3 * np.random.randn(annual_cycle.size, 3)
tmax_values = base + 10 + 3 * np.random.randn(annual_cycle.size, 3)

ds = xr.Dataset({"tmin": (("time", "location"), tmin_values),
                 "tmax": (("time", "location"), tmax_values),},
                {"time": times, "location": ["IA", "IN", "IL"]})

# Assign an attribute
ds = ds.assign_attrs(CRS = 'EPSG:4326')

# Change dtype
ds.astype(np.float32)

Expected Output

ds to be returned with variables of dtype np.float32, with attributes (e.g. CRS = 'EPSG:4326') still included in the dataset.

Problem Description

On xarray version 0.12.1, changing the dtype of a dataset preserves any attached attributes, e.g:

<xarray.Dataset>
Dimensions:   (location: 3, time: 731)
Coordinates:
  * location  (location) <U2 'IA' 'IN' 'IL'
  * time      (time) datetime64[ns] 2000-01-01 2000-01-02 ... 2001-12-31
Data variables:
    tmin      (time, location) float32 -8.03737 -1.7884412 ... -4.543927
    tmax      (time, location) float32 12.980549 3.3104093 ... 3.8052793
Attributes:
    CRS:      EPSG:4326

However, on xarray version 0.13.0, changing the dtype of a dataset silently drops any attached attributes, e.g:

<xarray.Dataset>
Dimensions:   (location: 3, time: 731)
Coordinates:
  * time      (time) datetime64[ns] 2000-01-01 2000-01-02 ... 2001-12-31
  * location  (location) <U2 'IA' 'IN' 'IL'
Data variables:
    tmin      (time, location) float32 -8.03737 -1.7884412 ... -4.543927
    tmax      (time, location) float32 12.980549 3.3104093 ... 3.8052793

This causes issues with large geospatial analyses (e.g. OpenDataCube workflows), as we need to change dtype to reduce memory, but also preserve CRS information that is used for downstream tools.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.8 (default, Jan 14 2019, 11:02:34) [GCC 8.0.1 20180414 (experimental) [trunk revision 259383]] python-bits: 64 OS: Linux OS-release: 4.14.133-113.112.amzn2.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: C.UTF-8 LANG: None LOCALE: en_US.UTF-8 libhdf5: 1.10.0 libnetcdf: 4.6.0

xarray: 0.13.0
pandas: 0.24.2
numpy: 1.16.2
scipy: 1.3.1
netCDF4: 1.3.1
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.0.24
cfgrib: None
iris: None
bottleneck: None
dask: 2.3.0
distributed: 2.3.2
matplotlib: 3.1.1
cartopy: 0.17.0
seaborn: None
numbagg: None
setuptools: 40.6.3
pip: 19.2.3
conda: None
pytest: 3.5.0
IPython: 7.8.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugtopic-metadataRelating to the handling of metadata (i.e. attrs and encoding)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions