.sel return errors when using floats for no apparent reason #7108

AlxLhrNc · 2022-09-30T01:45:44Z

What happened?

Using floats .sel() on different datasets from the same provider trigger an error. Despite the fact that the concerned dims are all in float32 type (see log).

Attempts with default float, numpy.float32() and numpy.float64() gave the same output.

What did you expect to happen?

Normal behavior of .sel().

Minimal Complete Verifiable Example

import xarray as xr
nc_ok = xr.open_dataset('H08_20220929_0000_1H_ROC010_FLDK.02401_02401.nc').load()
sub = nc_ok.sel(longitude = slice(161.001, 162.001))

nc_bug = xr.open_dataset('20220925000000-JAXA-L3C_GHRSST-SSTskin-H08_AHI-v2.0_daily-v02.0-fv01.0.nc').load()
sub = nc_bug.sel(lon = slice(161.001, 162.001))

MVCE confirmation

Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
Complete example — the example is self-contained, including all data and the text of any traceback.
Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

nc_ok = xr.open_dataset('H08_20220929_0000_1H_ROC010_FLDK.02401_02401.nc').load()

nc_ok.longitude
Out[12]: 
<xarray.DataArray 'longitude' (longitude: 2401)>
array([ 80.     ,  80.05   ,  80.1    , ..., 199.9    , 199.95001, 200.     ],
      dtype=float32)
Coordinates:
  * longitude  (longitude) float32 80.0 80.05 80.1 80.15 ... 199.9 200.0 200.0
Attributes:
    long_name:  longitude
    units:      degrees_east

nc_bug = xr.open_dataset('20220925000000-JAXA-L3C_GHRSST-SSTskin-H08_AHI-v2.0_daily-v02.0-fv01.0.nc').load()

nc_bug.lon
Out[14]: 
<xarray.DataArray 'lon' (lon: 6001)>
array([  80.     ,   80.02   ,   80.04   , ..., -160.04001, -160.02   ,
       -160.     ], dtype=float32)
Coordinates:
  * lon      (lon) float32 80.0 80.02 80.04 80.06 ... -160.0 -160.0 -160.0
Attributes:
    long_name:      longitude
    standard_name:  longitude
    axis:           X
    units:          degrees_east
    valid_min:      -180.0
    valid_max:      180.0
    grid_mapping:   Equirectangular
    comment:        geographical coordinates, WGS84 projection

sub = nc_ok.sel(longitude = slice(161.001, 162.001))

sub = nc_bug.sel(lon = slice(161.001, 162.001))
Traceback (most recent call last):

  File ~\Installed_Programs\Anaconda3\envs\phd\lib\site-packages\pandas\core\indexes\base.py:3800 in get_loc
    return self._engine.get_loc(casted_key)

  File pandas\_libs\index.pyx:138 in pandas._libs.index.IndexEngine.get_loc

  File pandas\_libs\index.pyx:165 in pandas._libs.index.IndexEngine.get_loc

  File pandas\_libs\hashtable_class_helper.pxi:1577 in pandas._libs.hashtable.Float64HashTable.get_item

  File pandas\_libs\hashtable_class_helper.pxi:1587 in pandas._libs.hashtable.Float64HashTable.get_item

KeyError: 161.001


The above exception was the direct cause of the following exception:

Traceback (most recent call last):

  Cell In [16], line 1
    sub = nc_bug.sel(lon = slice(161.001, 162.001))

  File ~\Installed_Programs\Anaconda3\envs\phd\lib\site-packages\xarray\core\dataset.py:2533 in sel
    query_results = map_index_queries(

  File ~\Installed_Programs\Anaconda3\envs\phd\lib\site-packages\xarray\core\indexing.py:183 in map_index_queries
    results.append(index.sel(labels, **options))  # type: ignore[call-arg]

  File ~\Installed_Programs\Anaconda3\envs\phd\lib\site-packages\xarray\core\indexes.py:377 in sel
    indexer = _query_slice(self.index, label, coord_name, method, tolerance)

  File ~\Installed_Programs\Anaconda3\envs\phd\lib\site-packages\xarray\core\indexes.py:150 in _query_slice
    indexer = index.slice_indexer(

  File ~\Installed_Programs\Anaconda3\envs\phd\lib\site-packages\pandas\core\indexes\base.py:6597 in slice_indexer
    start_slice, end_slice = self.slice_locs(start, end, step=step)

  File ~\Installed_Programs\Anaconda3\envs\phd\lib\site-packages\pandas\core\indexes\base.py:6805 in slice_locs
    start_slice = self.get_slice_bound(start, "left")

  File ~\Installed_Programs\Anaconda3\envs\phd\lib\site-packages\pandas\core\indexes\base.py:6724 in get_slice_bound
    raise err

  File ~\Installed_Programs\Anaconda3\envs\phd\lib\site-packages\pandas\core\indexes\base.py:6718 in get_slice_bound
    slc = self.get_loc(label)

  File ~\Installed_Programs\Anaconda3\envs\phd\lib\site-packages\pandas\core\indexes\base.py:3802 in get_loc
    raise KeyError(key) from err

KeyError: 161.001


sub = nc_bug.sel(lon = slice(np.float64(161.001), 162.001))
Traceback (most recent call last):

  File ~\Installed_Programs\Anaconda3\envs\phd\lib\site-packages\pandas\core\indexes\base.py:3800 in get_loc
    return self._engine.get_loc(casted_key)

  File pandas\_libs\index.pyx:138 in pandas._libs.index.IndexEngine.get_loc

  File pandas\_libs\index.pyx:165 in pandas._libs.index.IndexEngine.get_loc

  File pandas\_libs\hashtable_class_helper.pxi:1577 in pandas._libs.hashtable.Float64HashTable.get_item

  File pandas\_libs\hashtable_class_helper.pxi:1587 in pandas._libs.hashtable.Float64HashTable.get_item

KeyError: 161.001

Anything else we need to know?

The data are provided by JAXA P-Tree.

Environment

INSTALLED VERSIONS

commit: None
python: 3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:50:36) [MSC v.1929 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 140 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: ('English_New Zealand', '1252')
libhdf5: 1.12.1
libnetcdf: 4.8.1

xarray: 2022.6.0
pandas: 1.5.0
numpy: 1.23.3
scipy: 1.9.1
netCDF4: 1.6.0
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2022.9.1
distributed: None
matplotlib: 3.5.2
cartopy: 0.20.2
seaborn: None
numbagg: None
fsspec: 2022.8.2
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 65.3.0
pip: 22.2.2
conda: None
pytest: None
IPython: 8.5.0
sphinx: 5.2.1

The text was updated successfully, but these errors were encountered:

max-sixty · 2022-09-30T02:47:13Z

Generally this is because the floats aren't exactly the same value — does passing tolerance=0.1 help?

AlxLhrNc · 2022-09-30T02:58:34Z

Returned the following:
NotImplementedError: cannot use ``method`` argument if any indexers are slice objects

Traceback (most recent call last):

  Cell In [3], line 1
    sub = nc_bug.sel(lon = slice(161.001, 162.001), tolerance=0.1)

  File ~\Installed_Programs\Anaconda3\envs\phd\lib\site-packages\xarray\core\dataset.py:2533 in sel
    query_results = map_index_queries(

  File ~\Installed_Programs\Anaconda3\envs\phd\lib\site-packages\xarray\core\indexing.py:183 in map_index_queries
    results.append(index.sel(labels, **options))  # type: ignore[call-arg]

  File ~\Installed_Programs\Anaconda3\envs\phd\lib\site-packages\xarray\core\indexes.py:377 in sel
    indexer = _query_slice(self.index, label, coord_name, method, tolerance)

max-sixty · 2022-09-30T03:49:51Z

Ah, right. Does it select a value with just nc_bug.sel(lon = 161.001, tolerance=0.1)? Because the lon value which that selects is probably the value you need to use in the slice.

Float indexes are often a source of pain, unfortunately!

benbovy · 2022-09-30T13:03:26Z

It looks like the error is because of the non-monotonic coordinate labels for the "lon" coordinate in nc_bug rather than a float precision issue. The "lon" coordinate seems monotonic for nc_ok so it works.

When a slice is given as indexer, Xarray internally calls pandas.Index.slice_indexer(), which requires that the index must be ordered and unique (docs). Unfortunately, Pandas does not mention it while it raises a KeyError. Should we first check the index in Xarray and raise a nicer error message if it is not unique / ordered?

rhkleijn · 2022-09-30T15:01:42Z

Pandas docs seem stricter than the implementation. From this snippet from pandas source code monotonicity is only required after get_loc fails.

My concern with checking first is that code like below will stop working (if I understand correctly). Is has unique but (alphabetically) unsorted coords (although its order may have meaning for the user). I regularly select a slice by specifying the labels corresponding to the first and last elements I want to extract.

I would suggest in this case to just try while catching any KeyError and raising with a nicer message instead of always checking first.

import xarray as xr
da = xr.DataArray([0, 1, 2, 3], coords={'x': ['zero', 'one', 'two', 'three']})
da.sel(x=slice('zero', 'two'))

Out[1]: 
<xarray.DataArray (x: 3)>
array([0, 1, 2])
Coordinates:
  * x        (x) <U5 'zero' 'one' 'two'

max-sixty · 2022-09-30T16:59:06Z

Float indexes are often a source of pain, unfortunately!

...also for my ability to know what's going on, apparently :). Thanks a lot @benbovy .

Yes, that would be great to raise a more informative error. We could also put an issue in upstream if pandas itself has the same issue.

benbovy · 2022-10-03T06:57:17Z

TBH, I had to do some research before figuring out what was going on :).

AlxLhrNc · 2022-10-03T20:49:24Z

The values in nc.lon are technically ordered 'as they would be on a mercator projected map with origin at 0 N-0 E' considering I am dealing with data around 180 lon. Not that it would mater for pandas/xarray in that case. I suppose re-projecting it on a 0-360 would be the only way around this specific issue.

And to answer earlier comments, sub = nc_bug.sel(lon = 161.001, tolerance=.1) raised the following: KeyError: "not all values found in index 'lon'. Try setting the method keyword argument (example: method='nearest')."
Which, when tried raised ValueError: index must be monotonic increasing or decreasing. It is indeed a problem with the order of the index.

Thanks for your help and your time.

benbovy · 2022-10-03T21:28:43Z

I suppose re-projecting it on a 0-360 would be the only way around this specific issue.

A custom Xarray index would help, e.g., PeriodicBoundaryIndex (#7031) or a GeographicIndex leveraging libraries like S2Geometry or H3.

AlxLhrNc · 2022-10-05T02:21:15Z

Thanks, it finally worked.

AlxLhrNc added bug needs triage Issue that has not been reviewed by xarray team member labels Sep 30, 2022

max-sixty removed bug needs triage Issue that has not been reviewed by xarray team member labels Sep 30, 2022

AlxLhrNc closed this as completed Oct 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.sel return errors when using floats for no apparent reason #7108

.sel return errors when using floats for no apparent reason #7108

AlxLhrNc commented Sep 30, 2022

INSTALLED VERSIONS

max-sixty commented Sep 30, 2022

AlxLhrNc commented Sep 30, 2022

max-sixty commented Sep 30, 2022

benbovy commented Sep 30, 2022

rhkleijn commented Sep 30, 2022

max-sixty commented Sep 30, 2022

benbovy commented Oct 3, 2022

AlxLhrNc commented Oct 3, 2022

benbovy commented Oct 3, 2022

AlxLhrNc commented Oct 5, 2022

.sel return errors when using floats for no apparent reason #7108

.sel return errors when using floats for no apparent reason #7108

Comments

AlxLhrNc commented Sep 30, 2022

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

MVCE confirmation

Relevant log output

Anything else we need to know?

Environment

INSTALLED VERSIONS

max-sixty commented Sep 30, 2022

AlxLhrNc commented Sep 30, 2022

max-sixty commented Sep 30, 2022

benbovy commented Sep 30, 2022

rhkleijn commented Sep 30, 2022

max-sixty commented Sep 30, 2022

benbovy commented Oct 3, 2022

AlxLhrNc commented Oct 3, 2022

benbovy commented Oct 3, 2022

AlxLhrNc commented Oct 5, 2022