Skip to content

xarray v 2023.9.0: ValueError: unable to infer dtype on variable 'time'; xarray cannot serialize arbitrary Python objects #8653

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 of 5 tasks
jerabaul29 opened this issue Jan 24, 2024 · 1 comment

Comments

@jerabaul29
Copy link
Contributor

What happened?

I tried to save an xarray dataset with datetimes as data for its time dimension to a nc file with to_netcdf and got the error ValueError: unable to infer dtype on variable 'time'; xarray cannot serialize arbitrary Python objects.

What did you expect to happen?

I expected xarray to automatically detect these were datetimes, and convert them to whatever format xarray likes to work with internally to dump it into a CF compatible file, following what is described at #2512 .

Minimal Complete Verifiable Example

import xarray as xr
import datetime

times = [datetime.datetime(2024, 1, 1, 1, 1, 1, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 1, 1, 1, 1, 2, tzinfo=datetime.timezone.utc)]

data = [1, 2]

xr_result = xr.Dataset(
    {
        'time':
        xr.DataArray(dims=["time"],
                     data=times,
                     attrs={
                         "standard_name": "time",
                     }),
        #
        'data':
        xr.DataArray(dims=["time"],
                     data=data,
                     attrs={
                         "_FillValue": "NaN",
                         "standard_name": "some_data",
                     }),
    }
)

xr_result.to_netcdf("test.nc")

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

The example is available as a notebook viewable at:

https://github.com/jerabaul29/public_bug_reports/blob/main/xarray/2024_01_24/xarray_and_datetimes.ipynb

Environment

INSTALLED VERSIONS

commit: None
python: 3.11.5 (main, Sep 11 2023, 13:54:46) [GCC 11.2.0]
python-bits: 64
OS: Linux
OS-release: 6.5.0-14-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1

xarray: 2023.9.0
pandas: 2.0.3
numpy: 1.25.2
scipy: 1.11.3
netCDF4: 1.6.2
pydap: None
h5netcdf: None
h5py: 3.10.0
Nio: None
zarr: None
cftime: 1.6.3
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: 1.3.5
dask: 2023.9.2
distributed: 2023.9.2
matplotlib: 3.7.2
cartopy: 0.21.1
seaborn: 0.13.0
numbagg: None
fsspec: 2023.9.2
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.0.0
pip: 23.2.1
conda: None
pytest: None
mypy: None
IPython: 8.15.0
sphinx: None

@jerabaul29 jerabaul29 added bug needs triage Issue that has not been reviewed by xarray team member labels Jan 24, 2024
@jerabaul29 jerabaul29 changed the title xarray v 2023.9.0 ValueError: unable to infer dtype on variable 'time'; xarray cannot serialize arbitrary Python objects xarray v 2023.9.0: ValueError: unable to infer dtype on variable 'time'; xarray cannot serialize arbitrary Python objects Jan 24, 2024
@TomNicholas TomNicholas added topic-cftime io and removed needs triage Issue that has not been reviewed by xarray team member labels Jan 24, 2024
@kmuehlbauer
Copy link
Contributor

That seems to be some tricky issue with timezones.

The issue already manifests in the DataArray (Variable) creation. The given list of timezone aware datetimes is converted into pandas._libs.tslibs.timestamps.Timestamp (wrapped as numpy 'O'). This happens in

as_series = pd.Series(values.ravel(), copy=False)

In the further course there is no way of conversion to some numpy datetime64[ns] or similar to correctly serialize.

Same happens if you wrap your data as numpy array. This only works when stripping the tzinfo from the array (either by not adding tzinfo in the first place or casting to a proper type):

times = np.array(times).astype("<M8[ns]")

I'm not versed in that special part of DataArray/Variable creation with timezone aware datetimes and how to properly solve that issue. Hoping that others have more insight here, @spencerkclark?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants