Skip to content

Dataset.interp drops boolean variables #4761

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Illviljan opened this issue Jan 4, 2021 · 0 comments · Fixed by #5008
Closed

Dataset.interp drops boolean variables #4761

Illviljan opened this issue Jan 4, 2021 · 0 comments · Fixed by #5008

Comments

@Illviljan
Copy link
Contributor

Illviljan commented Jan 4, 2021

What happened:
Dataset.interp silently drops boolean variables.

What you expected to happen:
If I'm interpolating a group of variables I expect to get all of them back in the correct shape with relevant values in them.

If the variables are boolean or object arrays I don't expect it to do linear interpolation because it doesn't make sense but stepwise interpolation like nearest or zero order interpolation should be fine to expect.

Minimal Complete Verifiable Example:

import numpy as np
a = np.arange(0, 5)
b = np.core.defchararray.add("long_variable_name", a.astype(str))
coords = dict(time=da.array([0, 1]))
data_vars = dict()
for v in b:
    data_vars[v] = xr.DataArray(
        name=v,
        data=np.array([0, 1]).astype(bool),
        dims=["time"],
        coords=coords,
    )
ds1 = xr.Dataset(data_vars)

# Print raw data:
print(ds1)
Out[3]: 
<xarray.Dataset>
Dimensions:              (time: 2)
Coordinates:
  * time                 (time) int32 0 1
Data variables:
    long_variable_name0  (time) bool False True
    long_variable_name1  (time) bool False True
    long_variable_name2  (time) bool False True
    long_variable_name3  (time) bool False True
    long_variable_name4  (time) bool False True

# Interpolate:
ds1 = ds1.interp(
    time=da.array([0, 0.5, 1, 2]),
    assume_sorted=True,
    method="nearest",
    kwargs=dict(fill_value="extrapolate"),
)

# Print interpolated data:
<xarray.Dataset>
Dimensions:  (time: 4)
Coordinates:
  * time     (time) float64 0.0 0.5 1.0 2.0
Data variables:
    *empty*

Anything else we need to know?:
ds.interp_like use ds.reindex in these cases which seems like a good choice in ds.interp as well. But I think that both ds.interp and ds.interp_like should fill by default with nearest value instead of np.nan because we're still requesting interpolation.

Environment:

Output of xr.show_versions()

xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.8.5 (default, Sep 3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)]
python-bits: 64
OS: Windows
libhdf5: 1.10.4
libnetcdf: None

xarray: 0.16.2
pandas: 1.1.5
numpy: 1.17.5
scipy: 1.4.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2020.12.0
distributed: 2020.12.0
matplotlib: 3.3.2
cartopy: None
seaborn: 0.11.1
numbagg: None
pint: None
setuptools: 51.0.0.post20201207
pip: 20.3.3
conda: 4.9.2
pytest: 6.2.1
IPython: 7.19.0
sphinx: 3.4.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants