Skip to content

Time-based resampling drops lat/lon coordinate metadata #7012

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 4 tasks
Zeitsperre opened this issue Sep 8, 2022 · 5 comments · Fixed by #7022
Closed
2 of 4 tasks

Time-based resampling drops lat/lon coordinate metadata #7012

Zeitsperre opened this issue Sep 8, 2022 · 5 comments · Fixed by #7022

Comments

@Zeitsperre
Copy link
Contributor

Zeitsperre commented Sep 8, 2022

What happened?

When performing a DataArray resampling on a time dimension, the metadata attributes of non-affected coordinate variables are dropped. This behaviour breaks compatibility with cf_xarray as the coordinate metadata is needed to identify the X, Y, Z coordinates.

What did you expect to happen?

Metadata fields of unaffected coordinates (lat, lon, height) to be preserved.

Minimal Complete Verifiable Example

import xarray as xr
import cf_xarray


ds = xr.open_dataset("my_dataset_that_has_lat_and_lon_coordinates.nc")
tas = ds.tas.resample(time="MS").mean(dim="time")

tas.cf["latitude"]

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

KeyError                                  Traceback (most recent call last)
File ~/mambaforge/envs/xclim310/lib/python3.10/site-packages/xarray/core/dataarray.py:760, in DataArray._getitem_coord(self, key)
    759 try:
--> 760     var = self._coords[key]
    761 except KeyError:

KeyError: 'latitude'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
File ~/mambaforge/envs/xclim310/lib/python3.10/site-packages/cf_xarray/accessor.py:706, in _getitem(accessor, key, skip)
    705 for name in allnames:
--> 706     extravars = accessor.get_associated_variable_names(
    707         name, skip_bounds=scalar_key, error=False
    708     )
    709     coords.extend(itertools.chain(*extravars.values()))

File ~/mambaforge/envs/xclim310/lib/python3.10/site-packages/cf_xarray/accessor.py:1597, in CFAccessor.get_associated_variable_names(self, name, skip_bounds, error)
   1596 coords: dict[str, list[str]] = {k: [] for k in keys}
-> 1597 attrs_or_encoding = ChainMap(self._obj[name].attrs, self._obj[name].encoding)
   1599 if "coordinates" in attrs_or_encoding:

File ~/mambaforge/envs/xclim310/lib/python3.10/site-packages/xarray/core/dataarray.py:769, in DataArray.__getitem__(self, key)
    768 if isinstance(key, str):
--> 769     return self._getitem_coord(key)
    770 else:
    771     # xarray-style array indexing

File ~/mambaforge/envs/xclim310/lib/python3.10/site-packages/xarray/core/dataarray.py:763, in DataArray._getitem_coord(self, key)
    762     dim_sizes = dict(zip(self.dims, self.shape))
--> 763     _, key, var = _get_virtual_variable(self._coords, key, dim_sizes)
    765 return self._replace_maybe_drop_dims(var, name=key)

File ~/mambaforge/envs/xclim310/lib/python3.10/site-packages/xarray/core/dataset.py:175, in _get_virtual_variable(variables, key, dim_sizes)
    174 if len(split_key) != 2:
--> 175     raise KeyError(key)
    177 ref_name, var_name = split_key

KeyError: 'latitude'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
Input In [7], in <cell line: 1>()
----> 1 tas.cf["latitude"]

File ~/mambaforge/envs/xclim310/lib/python3.10/site-packages/cf_xarray/accessor.py:2526, in CFDataArrayAccessor.__getitem__(self, key)
   2521 if not isinstance(key, str):
   2522     raise KeyError(
   2523         f"Cannot use a list of keys with DataArrays. Expected a single string. Received {key!r} instead."
   2524     )
-> 2526 return _getitem(self, key)

File ~/mambaforge/envs/xclim310/lib/python3.10/site-packages/cf_xarray/accessor.py:749, in _getitem(accessor, key, skip)
    746     return ds.set_coords(coords)
    748 except KeyError:
--> 749     raise KeyError(
    750         f"{kind}.cf does not understand the key {k!r}. "
    751         f"Use 'repr({kind}.cf)' (or '{kind}.cf' in a Jupyter environment) to see a list of key names that can be interpreted."
    752     )

KeyError: "DataArray.cf does not understand the key 'latitude'. Use 'repr(DataArray.cf)' (or 'DataArray.cf' in a Jupyter environment) to see a list of key names that can be interpreted."

Anything else we need to know?

Before

netcdf tas_Amon_CanESM2_rcp85_r1i1p1_200701-200712 {
dimensions:
        time = UNLIMITED ; // (12 currently)
        bnds = 2 ;
        lat = 64 ;
        lon = 128 ;
variables:
        double time(time) ;
                time:_FillValue = NaN ;
                time:bounds = "time_bnds" ;
                time:axis = "T" ;
                time:long_name = "time" ;
                time:standard_name = "time" ;
                time:units = "days since 1850-01-01" ;
                time:calendar = "365_day" ;
        double time_bnds(time, bnds) ;
                time_bnds:_FillValue = NaN ;
                time_bnds:coordinates = "height" ;
        double lat(lat) ;
                lat:_FillValue = NaN ;
                lat:bounds = "lat_bnds" ;
                lat:units = "degrees_north" ;
                lat:axis = "Y" ;
                lat:long_name = "latitude" ;
                lat:standard_name = "latitude" ;
        double lat_bnds(lat, bnds) ;
                lat_bnds:_FillValue = NaN ;
                lat_bnds:coordinates = "height" ;
        double lon(lon) ;
                lon:_FillValue = NaN ;
                lon:bounds = "lon_bnds" ;
                lon:units = "degrees_east" ;
                lon:axis = "X" ;
                lon:long_name = "longitude" ;
                lon:standard_name = "longitude" ;
        double lon_bnds(lon, bnds) ;
                lon_bnds:_FillValue = NaN ;
                lon_bnds:coordinates = "height" ;
        double height ;
                height:_FillValue = NaN ;
                height:units = "m" ;
                height:axis = "Z" ;
                height:positive = "up" ;
                height:long_name = "height" ;
                height:standard_name = "height" ;
        float tas(time, lat, lon) ;
                tas:_FillValue = 1.e+20f ;
                tas:standard_name = "air_temperature" ;
                tas:long_name = "Near-Surface Air Temperature" ;
                tas:units = "K" ;
                tas:original_name = "ST" ;
                tas:cell_methods = "time: mean (interval: 15 minutes)" ;
                tas:cell_measures = "area: areacella" ;
                tas:history = "2011-03-10T05:13:26Z altered by CMOR: Treated scalar dimension: \'height\'. 2011-03-10T05:13:26Z altered by CMOR: replaced missing value flag (1e+38) with standard missing value (1e+20)." ;
                tas:associated_files = "baseURL: http://cmip-pcmdi.llnl.gov/CMIP5/dataLocation gridspecFile: gridspec_atmos_fx_CanESM2_rcp85_r0i0p0.nc areacella: areacella_fx_CanESM2_rcp85_r0i0p0.nc" ;
                tas:coordinates = "height" ;
                tas:missing_value = 1.e+20f ;

After

netcdf test_cf_lat_new {
dimensions:
        lat = 64 ;
        lon = 128 ;
        time = 11 ;
variables:
        double lat(lat) ;
                lat:_FillValue = NaN ;
        double lon(lon) ;
                lon:_FillValue = NaN ;
        double height ;
                height:_FillValue = NaN ;
                height:units = "m" ;
                height:axis = "Z" ;
                height:positive = "up" ;
                height:long_name = "height" ;
                height:standard_name = "height" ;
        int64 time(time) ;
                time:units = "days since 2007-01-01 00:00:00.000000" ;
                time:calendar = "noleap" ;
        float tas(time, lat, lon) ;
                tas:_FillValue = NaNf ;
                tas:coordinates = "height" ;

Environment

INSTALLED VERSIONS

commit: None
python: 3.10.5 | packaged by conda-forge | (main, Jun 14 2022, 07:06:46) [GCC 10.3.0]
python-bits: 64
OS: Linux
OS-release: 5.19.6-200.fc36.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_CA.UTF-8
LOCALE: ('en_CA', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1
xarray: 2022.6.0
pandas: 1.3.5
numpy: 1.22.4
scipy: 1.8.1
netCDF4: 1.6.0
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.2
nc_time_axis: 1.4.1
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.5
dask: 2022.6.1
distributed: 2022.6.1
matplotlib: 3.5.2
cartopy: None
seaborn: None
numbagg: None
fsspec: 2022.7.1
cupy: None
pint: 0.19.2
sparse: 0.13.0
flox: 0.5.10.dev8+gfbc2af8
numpy_groupies: 0.9.19
setuptools: 59.8.0
pip: 22.2.1
conda: None
pytest: 7.1.2
IPython: 8.4.0
sphinx: 5.1.1

@Zeitsperre Zeitsperre added bug needs triage Issue that has not been reviewed by xarray team member labels Sep 8, 2022
@mathause
Copy link
Collaborator

mathause commented Sep 9, 2022

I cannot reproduce this (on master but I don't think we touched this since 2022.06):

import xarray as xr
import cf_xarray
air = xr.tutorial.open_dataset("air_temperature")
air.resample(time="MS").mean(dim="time").lat

Returns

<xarray.DataArray 'lat' (lat: 25)>
array([75. , 72.5, 70. , 67.5, 65. , 62.5, 60. , 57.5, 55. , 52.5, 50. , 47.5,
       45. , 42.5, 40. , 37.5, 35. , 32.5, 30. , 27.5, 25. , 22.5, 20. , 17.5,
       15. ], dtype=float32)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
Attributes:
    standard_name:  latitude
    long_name:      Latitude
    units:          degrees_north
    axis:           Y

Would be interesting to get to the bottom of this.

(Funny enough - the attrs are not even dropped for r.mean(keep_attrs=False).lat.)

@mathause
Copy link
Collaborator

Ah - I probably don't have flox installed - that could be the reason.

@Zeitsperre
Copy link
Contributor Author

@mathause You're right! I noticed this first in my builds using "upstream" dependencies (xarray@main, flox@main, cftime@master, bottleneck@master). It might indeed be flox-related!

@mathause
Copy link
Collaborator

cc @dcherian

@dcherian dcherian removed the needs triage Issue that has not been reviewed by xarray team member label Sep 12, 2022
dcherian added a commit to dcherian/xarray that referenced this issue Sep 12, 2022
Closes pydata#7012

When `use_flox=False`, we pass `keep_attrs=None` to `Data*.reduce`.
This ends up delete Dataset and DataArray level attrs but preserves
coordinate variable attrs. We now set the default to True always to
match the default behaviour of `apply_ufunc` used by
`flox.xarray.xarray_reduce`.
@dcherian
Copy link
Contributor

Thanks @Zeitsperre and @mathause . I made a bad choice for the default in #6667 . See #7022 for a solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants