Skip to content

nanosecond precision lost when reading time data #7817

@kmuehlbauer

Description

@kmuehlbauer

What happened?

When reading nanosecond precision time data from netcdf the precision is lost. This happens because CFMaskCoder will convert the variable to floating point and insert "NaN". In CFDatetimeCoder the floating point is cast back to int64 to transform into datetime64. This casting is sometimes undefined, hence #7098.

What did you expect to happen?

Precision should be preserved. The transformation to floating point should be omitted.

Minimal Complete Verifiable Example

import xarray as xr
import numpy as np
import netCDF4 as nc
import matplotlib.pyplot as plt

# create time array and fillvalue
min_ns = -9223372036854775808
max_ns = 9223372036854775807
cnt = 2000
time_arr = np.arange(min_ns, min_ns + cnt, dtype=np.int64).astype("M8[ns]")
fill_value = np.datetime64("1900-01-01", "ns")

# create ncfile with time with attached _FillValue
with nc.Dataset("test.nc", mode="w") as ds:
    ds.createDimension("x", cnt)
    time = ds.createVariable("time", "<i8", ("x",), fill_value=fill_value)
    time[:] = time_arr
    time.units = "nanoseconds since 1970-01-01"

# normal decoding
with xr.open_dataset("test.nc").load() as xr_ds:
    print("--- normal decoding ----------------------")
    print(xr_ds["time"])
    plt.plot(xr_ds["time"].values.astype(np.int64) + max_ns, color="g", label="normal")

# no decoding
with xr.open_dataset("test.nc", decode_cf=False).load() as xr_ds:
    print("--- no decoding ----------------------")
    print(xr_ds["time"])
    plt.plot(xr_ds["time"].values + max_ns, lw=5, color="b", label="raw")
    
# do not decode times, this shows how the CFMaskCoder converts 
# the array to floating point before it would run CFDatetimeCoder
with xr.open_dataset("test.nc", decode_times=False).load() as xr_ds:
    print("--- no time decoding ----------------------")
    print(xr_ds["time"])
    
# do not run CFMaskCoder to show that times will be converted nicely
# with CFDatetimeCoder
with xr.open_dataset("test.nc", mask_and_scale=False).load() as xr_ds:
    print("--- no masking ------------------------------")
    print(xr_ds["time"])
    plt.plot(xr_ds["time"].values.astype(np.int64) + max_ns, lw=2, color="r", label="nomask")

plt.legend()

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

--- normal decoding ----------------------
<xarray.DataArray 'time' (x: 2000)>
array([                          'NaT',                           'NaT',
                                 'NaT', ...,
       '1677-09-21T00:12:43.145226240', '1677-09-21T00:12:43.145226240',
       '1677-09-21T00:12:43.145226240'], dtype='datetime64[ns]')
Dimensions without coordinates: x
--- no decoding ----------------------
<xarray.DataArray 'time' (x: 2000)>
array([-9223372036854775808, -9223372036854775807, -9223372036854775806,
       ..., -9223372036854773811, -9223372036854773810,
       -9223372036854773809])
Dimensions without coordinates: x
Attributes:
    _FillValue:  -2208988800000000000
    units:       nanoseconds since 1970-01-01
--- no time decoding ----------------------
<xarray.DataArray 'time' (x: 2000)>
array([-9.22337204e+18, -9.22337204e+18, -9.22337204e+18, ...,
       -9.22337204e+18, -9.22337204e+18, -9.22337204e+18])
Dimensions without coordinates: x
Attributes:
    units:    nanoseconds since 1970-01-01
--- no masking ------------------------------
<xarray.DataArray 'time' (x: 2000)>
array([                          'NaT', '1677-09-21T00:12:43.145224193',
       '1677-09-21T00:12:43.145224194', ...,
       '1677-09-21T00:12:43.145226189', '1677-09-21T00:12:43.145226190',
       '1677-09-21T00:12:43.145226191'], dtype='datetime64[ns]')
Dimensions without coordinates: x
Attributes:
    _FillValue:  -2208988800000000000

Anything else we need to know?

Plot from above code:

time-fillval

Xref: #7098, #7790 (comment)

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.0 | packaged by conda-forge | (main, Jan 14 2023, 12:27:40) [GCC 11.3.0] python-bits: 64 OS: Linux OS-release: 5.14.21-150400.24.60-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: de_DE.UTF-8 LOCALE: ('de_DE', 'UTF-8') libhdf5: 1.14.0 libnetcdf: 4.9.2

xarray: 2023.4.2
pandas: 2.0.1
numpy: 1.24.2
scipy: 1.10.1
netCDF4: 1.6.3
pydap: None
h5netcdf: 1.1.0
h5py: 3.8.0
Nio: None
zarr: 2.14.2
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: 1.3.7
dask: 2023.3.1
distributed: 2023.3.1
matplotlib: 3.7.1
cartopy: 0.21.1
seaborn: None
numbagg: None
fsspec: 2023.3.0
cupy: 11.6.0
pint: 0.20.1
sparse: None
flox: None
numpy_groupies: None
setuptools: 67.6.0
pip: 23.0.1
conda: None
pytest: 7.2.2
mypy: None
IPython: 8.11.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions