Skip to content

Time-based resampling drops lat/lon coordinate metadata #7012

Closed
@Zeitsperre

Description

@Zeitsperre

What happened?

When performing a DataArray resampling on a time dimension, the metadata attributes of non-affected coordinate variables are dropped. This behaviour breaks compatibility with cf_xarray as the coordinate metadata is needed to identify the X, Y, Z coordinates.

What did you expect to happen?

Metadata fields of unaffected coordinates (lat, lon, height) to be preserved.

Minimal Complete Verifiable Example

import xarray as xr
import cf_xarray


ds = xr.open_dataset("my_dataset_that_has_lat_and_lon_coordinates.nc")
tas = ds.tas.resample(time="MS").mean(dim="time")

tas.cf["latitude"]

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

KeyError                                  Traceback (most recent call last)
File ~/mambaforge/envs/xclim310/lib/python3.10/site-packages/xarray/core/dataarray.py:760, in DataArray._getitem_coord(self, key)
    759 try:
--> 760     var = self._coords[key]
    761 except KeyError:

KeyError: 'latitude'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
File ~/mambaforge/envs/xclim310/lib/python3.10/site-packages/cf_xarray/accessor.py:706, in _getitem(accessor, key, skip)
    705 for name in allnames:
--> 706     extravars = accessor.get_associated_variable_names(
    707         name, skip_bounds=scalar_key, error=False
    708     )
    709     coords.extend(itertools.chain(*extravars.values()))

File ~/mambaforge/envs/xclim310/lib/python3.10/site-packages/cf_xarray/accessor.py:1597, in CFAccessor.get_associated_variable_names(self, name, skip_bounds, error)
   1596 coords: dict[str, list[str]] = {k: [] for k in keys}
-> 1597 attrs_or_encoding = ChainMap(self._obj[name].attrs, self._obj[name].encoding)
   1599 if "coordinates" in attrs_or_encoding:

File ~/mambaforge/envs/xclim310/lib/python3.10/site-packages/xarray/core/dataarray.py:769, in DataArray.__getitem__(self, key)
    768 if isinstance(key, str):
--> 769     return self._getitem_coord(key)
    770 else:
    771     # xarray-style array indexing

File ~/mambaforge/envs/xclim310/lib/python3.10/site-packages/xarray/core/dataarray.py:763, in DataArray._getitem_coord(self, key)
    762     dim_sizes = dict(zip(self.dims, self.shape))
--> 763     _, key, var = _get_virtual_variable(self._coords, key, dim_sizes)
    765 return self._replace_maybe_drop_dims(var, name=key)

File ~/mambaforge/envs/xclim310/lib/python3.10/site-packages/xarray/core/dataset.py:175, in _get_virtual_variable(variables, key, dim_sizes)
    174 if len(split_key) != 2:
--> 175     raise KeyError(key)
    177 ref_name, var_name = split_key

KeyError: 'latitude'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
Input In [7], in <cell line: 1>()
----> 1 tas.cf["latitude"]

File ~/mambaforge/envs/xclim310/lib/python3.10/site-packages/cf_xarray/accessor.py:2526, in CFDataArrayAccessor.__getitem__(self, key)
   2521 if not isinstance(key, str):
   2522     raise KeyError(
   2523         f"Cannot use a list of keys with DataArrays. Expected a single string. Received {key!r} instead."
   2524     )
-> 2526 return _getitem(self, key)

File ~/mambaforge/envs/xclim310/lib/python3.10/site-packages/cf_xarray/accessor.py:749, in _getitem(accessor, key, skip)
    746     return ds.set_coords(coords)
    748 except KeyError:
--> 749     raise KeyError(
    750         f"{kind}.cf does not understand the key {k!r}. "
    751         f"Use 'repr({kind}.cf)' (or '{kind}.cf' in a Jupyter environment) to see a list of key names that can be interpreted."
    752     )

KeyError: "DataArray.cf does not understand the key 'latitude'. Use 'repr(DataArray.cf)' (or 'DataArray.cf' in a Jupyter environment) to see a list of key names that can be interpreted."

Anything else we need to know?

Before

netcdf tas_Amon_CanESM2_rcp85_r1i1p1_200701-200712 {
dimensions:
        time = UNLIMITED ; // (12 currently)
        bnds = 2 ;
        lat = 64 ;
        lon = 128 ;
variables:
        double time(time) ;
                time:_FillValue = NaN ;
                time:bounds = "time_bnds" ;
                time:axis = "T" ;
                time:long_name = "time" ;
                time:standard_name = "time" ;
                time:units = "days since 1850-01-01" ;
                time:calendar = "365_day" ;
        double time_bnds(time, bnds) ;
                time_bnds:_FillValue = NaN ;
                time_bnds:coordinates = "height" ;
        double lat(lat) ;
                lat:_FillValue = NaN ;
                lat:bounds = "lat_bnds" ;
                lat:units = "degrees_north" ;
                lat:axis = "Y" ;
                lat:long_name = "latitude" ;
                lat:standard_name = "latitude" ;
        double lat_bnds(lat, bnds) ;
                lat_bnds:_FillValue = NaN ;
                lat_bnds:coordinates = "height" ;
        double lon(lon) ;
                lon:_FillValue = NaN ;
                lon:bounds = "lon_bnds" ;
                lon:units = "degrees_east" ;
                lon:axis = "X" ;
                lon:long_name = "longitude" ;
                lon:standard_name = "longitude" ;
        double lon_bnds(lon, bnds) ;
                lon_bnds:_FillValue = NaN ;
                lon_bnds:coordinates = "height" ;
        double height ;
                height:_FillValue = NaN ;
                height:units = "m" ;
                height:axis = "Z" ;
                height:positive = "up" ;
                height:long_name = "height" ;
                height:standard_name = "height" ;
        float tas(time, lat, lon) ;
                tas:_FillValue = 1.e+20f ;
                tas:standard_name = "air_temperature" ;
                tas:long_name = "Near-Surface Air Temperature" ;
                tas:units = "K" ;
                tas:original_name = "ST" ;
                tas:cell_methods = "time: mean (interval: 15 minutes)" ;
                tas:cell_measures = "area: areacella" ;
                tas:history = "2011-03-10T05:13:26Z altered by CMOR: Treated scalar dimension: \'height\'. 2011-03-10T05:13:26Z altered by CMOR: replaced missing value flag (1e+38) with standard missing value (1e+20)." ;
                tas:associated_files = "baseURL: http://cmip-pcmdi.llnl.gov/CMIP5/dataLocation gridspecFile: gridspec_atmos_fx_CanESM2_rcp85_r0i0p0.nc areacella: areacella_fx_CanESM2_rcp85_r0i0p0.nc" ;
                tas:coordinates = "height" ;
                tas:missing_value = 1.e+20f ;

After

netcdf test_cf_lat_new {
dimensions:
        lat = 64 ;
        lon = 128 ;
        time = 11 ;
variables:
        double lat(lat) ;
                lat:_FillValue = NaN ;
        double lon(lon) ;
                lon:_FillValue = NaN ;
        double height ;
                height:_FillValue = NaN ;
                height:units = "m" ;
                height:axis = "Z" ;
                height:positive = "up" ;
                height:long_name = "height" ;
                height:standard_name = "height" ;
        int64 time(time) ;
                time:units = "days since 2007-01-01 00:00:00.000000" ;
                time:calendar = "noleap" ;
        float tas(time, lat, lon) ;
                tas:_FillValue = NaNf ;
                tas:coordinates = "height" ;

Environment

INSTALLED VERSIONS

commit: None
python: 3.10.5 | packaged by conda-forge | (main, Jun 14 2022, 07:06:46) [GCC 10.3.0]
python-bits: 64
OS: Linux
OS-release: 5.19.6-200.fc36.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_CA.UTF-8
LOCALE: ('en_CA', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1
xarray: 2022.6.0
pandas: 1.3.5
numpy: 1.22.4
scipy: 1.8.1
netCDF4: 1.6.0
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.2
nc_time_axis: 1.4.1
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.5
dask: 2022.6.1
distributed: 2022.6.1
matplotlib: 3.5.2
cartopy: None
seaborn: None
numbagg: None
fsspec: 2022.7.1
cupy: None
pint: 0.19.2
sparse: 0.13.0
flox: 0.5.10.dev8+gfbc2af8
numpy_groupies: 0.9.19
setuptools: 59.8.0
pip: 22.2.1
conda: None
pytest: 7.1.2
IPython: 8.4.0
sphinx: 5.1.1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions