Skip to content

[Bug]: reading NaT/NaN on M1 ARM chip #6191

@philippemiron

Description

@philippemiron

What happened?

I have nan values in a date vector stored in a netCDF. When I read on my ARM Apple computer with xr.open_dataset(), it is not properly recognized.

For example, the following data is stored in a NetCDF:

date = pd.date_range(...)
date[4] = nan

Then when I read the file:
date[4] is set to date[0], which is the first date of the range instead of a 'NaT'.

I understand that this issue is quite weird and it doesn't seem to happen on other OS. Actually, I try on MacOS (with an intel processor) and on two different Linux computers, and in those configurations, date[4] is properly set to 'NaT' after opening the netCDF with xr.open_dataset(). Note that I tried with the same version of xarray as well as with different versions, and I just can't seem to reproduce this issue on any machine except on the M1 ARM chip.

What did you expect to happen?

I expect the following result after running the minimal example:

array(['2022-01-01T00:00:00.000000000', '2022-01-02T00:00:00.000000000',
       '2022-01-03T00:00:00.000000000', '2022-01-04T00:00:00.000000000',
                                 'NaT', '2022-01-06T00:00:00.000000000',
       '2022-01-07T00:00:00.000000000', '2022-01-08T00:00:00.000000000',
       '2022-01-09T00:00:00.000000000', '2022-01-10T00:00:00.000000000'],
      dtype='datetime64[ns]')

Minimal Complete Verifiable Example

import xarray as xr
import pandas as pd
import numpy as np

time = pd.date_range(start="2022-01-01",end="2022-01-10").to_pydatetime()
time[4] = np.datetime64("NaT")

ds = xr.Dataset(
    data_vars=dict(
        time=(["nt"], time),
    ),
)
ds.to_netcdf('test.nc')

ds_r = xr.open_dataset('test.nc')
ds_r.time

Relevant log output

array(['2022-01-01T00:00:00.000000000', '2022-01-02T00:00:00.000000000',
       '2022-01-03T00:00:00.000000000', '2022-01-04T00:00:00.000000000',
       '2022-01-01T00:00:00.000000000', '2022-01-06T00:00:00.000000000',
       '2022-01-07T00:00:00.000000000', '2022-01-08T00:00:00.000000000',
       '2022-01-09T00:00:00.000000000', '2022-01-10T00:00:00.000000000'],
      dtype='datetime64[ns]')

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.10.1 | packaged by conda-forge | (main, Dec 22 2021, 01:38:36) [Clang 11.1.0 ]
python-bits: 64
OS: Darwin
OS-release: 21.2.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1

xarray: 0.20.2
pandas: 1.3.5
numpy: 1.21.5
scipy: 1.7.3
netCDF4: 1.5.8
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.5.1.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2021.12.0
distributed: 2021.12.0
matplotlib: 3.5.1
cartopy: 0.20.1
seaborn: None
numbagg: None
fsspec: 2021.11.1
cupy: None
pint: None
sparse: None
setuptools: 60.0.4
pip: 21.3.1
conda: None
pytest: None
IPython: 8.0.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions