Skip to content

.sel return errors when using floats for no apparent reason #7108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 of 4 tasks
AlxLhrNc opened this issue Sep 30, 2022 · 10 comments
Closed
1 of 4 tasks

.sel return errors when using floats for no apparent reason #7108

AlxLhrNc opened this issue Sep 30, 2022 · 10 comments

Comments

@AlxLhrNc
Copy link

What happened?

Using floats .sel() on different datasets from the same provider trigger an error. Despite the fact that the concerned dims are all in float32 type (see log).

Attempts with default float, numpy.float32() and numpy.float64() gave the same output.

What did you expect to happen?

Normal behavior of .sel().

Minimal Complete Verifiable Example

import xarray as xr
nc_ok = xr.open_dataset('H08_20220929_0000_1H_ROC010_FLDK.02401_02401.nc').load()
sub = nc_ok.sel(longitude = slice(161.001, 162.001))

nc_bug = xr.open_dataset('20220925000000-JAXA-L3C_GHRSST-SSTskin-H08_AHI-v2.0_daily-v02.0-fv01.0.nc').load()
sub = nc_bug.sel(lon = slice(161.001, 162.001))

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

nc_ok = xr.open_dataset('H08_20220929_0000_1H_ROC010_FLDK.02401_02401.nc').load()

nc_ok.longitude
Out[12]: 
<xarray.DataArray 'longitude' (longitude: 2401)>
array([ 80.     ,  80.05   ,  80.1    , ..., 199.9    , 199.95001, 200.     ],
      dtype=float32)
Coordinates:
  * longitude  (longitude) float32 80.0 80.05 80.1 80.15 ... 199.9 200.0 200.0
Attributes:
    long_name:  longitude
    units:      degrees_east

nc_bug = xr.open_dataset('20220925000000-JAXA-L3C_GHRSST-SSTskin-H08_AHI-v2.0_daily-v02.0-fv01.0.nc').load()

nc_bug.lon
Out[14]: 
<xarray.DataArray 'lon' (lon: 6001)>
array([  80.     ,   80.02   ,   80.04   , ..., -160.04001, -160.02   ,
       -160.     ], dtype=float32)
Coordinates:
  * lon      (lon) float32 80.0 80.02 80.04 80.06 ... -160.0 -160.0 -160.0
Attributes:
    long_name:      longitude
    standard_name:  longitude
    axis:           X
    units:          degrees_east
    valid_min:      -180.0
    valid_max:      180.0
    grid_mapping:   Equirectangular
    comment:        geographical coordinates, WGS84 projection

sub = nc_ok.sel(longitude = slice(161.001, 162.001))

sub = nc_bug.sel(lon = slice(161.001, 162.001))
Traceback (most recent call last):

  File ~\Installed_Programs\Anaconda3\envs\phd\lib\site-packages\pandas\core\indexes\base.py:3800 in get_loc
    return self._engine.get_loc(casted_key)

  File pandas\_libs\index.pyx:138 in pandas._libs.index.IndexEngine.get_loc

  File pandas\_libs\index.pyx:165 in pandas._libs.index.IndexEngine.get_loc

  File pandas\_libs\hashtable_class_helper.pxi:1577 in pandas._libs.hashtable.Float64HashTable.get_item

  File pandas\_libs\hashtable_class_helper.pxi:1587 in pandas._libs.hashtable.Float64HashTable.get_item

KeyError: 161.001


The above exception was the direct cause of the following exception:

Traceback (most recent call last):

  Cell In [16], line 1
    sub = nc_bug.sel(lon = slice(161.001, 162.001))

  File ~\Installed_Programs\Anaconda3\envs\phd\lib\site-packages\xarray\core\dataset.py:2533 in sel
    query_results = map_index_queries(

  File ~\Installed_Programs\Anaconda3\envs\phd\lib\site-packages\xarray\core\indexing.py:183 in map_index_queries
    results.append(index.sel(labels, **options))  # type: ignore[call-arg]

  File ~\Installed_Programs\Anaconda3\envs\phd\lib\site-packages\xarray\core\indexes.py:377 in sel
    indexer = _query_slice(self.index, label, coord_name, method, tolerance)

  File ~\Installed_Programs\Anaconda3\envs\phd\lib\site-packages\xarray\core\indexes.py:150 in _query_slice
    indexer = index.slice_indexer(

  File ~\Installed_Programs\Anaconda3\envs\phd\lib\site-packages\pandas\core\indexes\base.py:6597 in slice_indexer
    start_slice, end_slice = self.slice_locs(start, end, step=step)

  File ~\Installed_Programs\Anaconda3\envs\phd\lib\site-packages\pandas\core\indexes\base.py:6805 in slice_locs
    start_slice = self.get_slice_bound(start, "left")

  File ~\Installed_Programs\Anaconda3\envs\phd\lib\site-packages\pandas\core\indexes\base.py:6724 in get_slice_bound
    raise err

  File ~\Installed_Programs\Anaconda3\envs\phd\lib\site-packages\pandas\core\indexes\base.py:6718 in get_slice_bound
    slc = self.get_loc(label)

  File ~\Installed_Programs\Anaconda3\envs\phd\lib\site-packages\pandas\core\indexes\base.py:3802 in get_loc
    raise KeyError(key) from err

KeyError: 161.001


sub = nc_bug.sel(lon = slice(np.float64(161.001), 162.001))
Traceback (most recent call last):

  File ~\Installed_Programs\Anaconda3\envs\phd\lib\site-packages\pandas\core\indexes\base.py:3800 in get_loc
    return self._engine.get_loc(casted_key)

  File pandas\_libs\index.pyx:138 in pandas._libs.index.IndexEngine.get_loc

  File pandas\_libs\index.pyx:165 in pandas._libs.index.IndexEngine.get_loc

  File pandas\_libs\hashtable_class_helper.pxi:1577 in pandas._libs.hashtable.Float64HashTable.get_item

  File pandas\_libs\hashtable_class_helper.pxi:1587 in pandas._libs.hashtable.Float64HashTable.get_item

KeyError: 161.001

Anything else we need to know?

The data are provided by JAXA P-Tree.

Environment

INSTALLED VERSIONS

commit: None
python: 3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:50:36) [MSC v.1929 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 140 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: ('English_New Zealand', '1252')
libhdf5: 1.12.1
libnetcdf: 4.8.1

xarray: 2022.6.0
pandas: 1.5.0
numpy: 1.23.3
scipy: 1.9.1
netCDF4: 1.6.0
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2022.9.1
distributed: None
matplotlib: 3.5.2
cartopy: 0.20.2
seaborn: None
numbagg: None
fsspec: 2022.8.2
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 65.3.0
pip: 22.2.2
conda: None
pytest: None
IPython: 8.5.0
sphinx: 5.2.1

@AlxLhrNc AlxLhrNc added bug needs triage Issue that has not been reviewed by xarray team member labels Sep 30, 2022
@max-sixty
Copy link
Collaborator

Generally this is because the floats aren't exactly the same value — does passing tolerance=0.1 help?

@max-sixty max-sixty removed bug needs triage Issue that has not been reviewed by xarray team member labels Sep 30, 2022
@AlxLhrNc
Copy link
Author

Returned the following:
NotImplementedError: cannot use ``method`` argument if any indexers are slice objects

Traceback (most recent call last):

  Cell In [3], line 1
    sub = nc_bug.sel(lon = slice(161.001, 162.001), tolerance=0.1)

  File ~\Installed_Programs\Anaconda3\envs\phd\lib\site-packages\xarray\core\dataset.py:2533 in sel
    query_results = map_index_queries(

  File ~\Installed_Programs\Anaconda3\envs\phd\lib\site-packages\xarray\core\indexing.py:183 in map_index_queries
    results.append(index.sel(labels, **options))  # type: ignore[call-arg]

  File ~\Installed_Programs\Anaconda3\envs\phd\lib\site-packages\xarray\core\indexes.py:377 in sel
    indexer = _query_slice(self.index, label, coord_name, method, tolerance)

@max-sixty
Copy link
Collaborator

Ah, right. Does it select a value with just nc_bug.sel(lon = 161.001, tolerance=0.1)? Because the lon value which that selects is probably the value you need to use in the slice.

Float indexes are often a source of pain, unfortunately!

@benbovy
Copy link
Member

benbovy commented Sep 30, 2022

It looks like the error is because of the non-monotonic coordinate labels for the "lon" coordinate in nc_bug rather than a float precision issue. The "lon" coordinate seems monotonic for nc_ok so it works.

When a slice is given as indexer, Xarray internally calls pandas.Index.slice_indexer(), which requires that the index must be ordered and unique (docs). Unfortunately, Pandas does not mention it while it raises a KeyError. Should we first check the index in Xarray and raise a nicer error message if it is not unique / ordered?

@rhkleijn
Copy link
Contributor

Pandas docs seem stricter than the implementation. From this snippet from pandas source code monotonicity is only required after get_loc fails.

My concern with checking first is that code like below will stop working (if I understand correctly). Is has unique but (alphabetically) unsorted coords (although its order may have meaning for the user). I regularly select a slice by specifying the labels corresponding to the first and last elements I want to extract.

I would suggest in this case to just try while catching any KeyError and raising with a nicer message instead of always checking first.

import xarray as xr
da = xr.DataArray([0, 1, 2, 3], coords={'x': ['zero', 'one', 'two', 'three']})
da.sel(x=slice('zero', 'two'))
Out[1]: 
<xarray.DataArray (x: 3)>
array([0, 1, 2])
Coordinates:
  * x        (x) <U5 'zero' 'one' 'two'

@max-sixty
Copy link
Collaborator

Float indexes are often a source of pain, unfortunately!

...also for my ability to know what's going on, apparently :). Thanks a lot @benbovy .


Yes, that would be great to raise a more informative error. We could also put an issue in upstream if pandas itself has the same issue.

@benbovy
Copy link
Member

benbovy commented Oct 3, 2022

TBH, I had to do some research before figuring out what was going on :).

@AlxLhrNc
Copy link
Author

AlxLhrNc commented Oct 3, 2022

The values in nc.lon are technically ordered 'as they would be on a mercator projected map with origin at 0 N-0 E' considering I am dealing with data around 180 lon. Not that it would mater for pandas/xarray in that case. I suppose re-projecting it on a 0-360 would be the only way around this specific issue.

And to answer earlier comments, sub = nc_bug.sel(lon = 161.001, tolerance=.1) raised the following: KeyError: "not all values found in index 'lon'. Try setting the method keyword argument (example: method='nearest')."
Which, when tried raised ValueError: index must be monotonic increasing or decreasing. It is indeed a problem with the order of the index.

Thanks for your help and your time.

@benbovy
Copy link
Member

benbovy commented Oct 3, 2022

I suppose re-projecting it on a 0-360 would be the only way around this specific issue.

A custom Xarray index would help, e.g., PeriodicBoundaryIndex (#7031) or a GeographicIndex leveraging libraries like S2Geometry or H3.

@AlxLhrNc
Copy link
Author

AlxLhrNc commented Oct 5, 2022

Thanks, it finally worked.

@AlxLhrNc AlxLhrNc closed this as completed Oct 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants