Skip to content

execute_local_udf doesn't work with latest NetCDF files #314

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
soxofaan opened this issue Jun 16, 2022 · 3 comments
Closed

execute_local_udf doesn't work with latest NetCDF files #314

soxofaan opened this issue Jun 16, 2022 · 3 comments
Labels

Comments

@soxofaan
Copy link
Member

downloaded a netcdf file and tried to use execute_local_udf on it

execute_local_udf(udf_code, "copernicus-raw.nc")
...
ValueError: could not convert string to float: ''

reason: netcdf files from VITO backend now have a dimension-less "crs" variable

<xarray.Dataset>
Dimensions:  (t: 1, x: 215, y: 144)
Coordinates:
  * t        (t) datetime64[ns] 2013-01-10
  * x        (x) float64 4.001 4.004 4.007 4.01 4.013 ... 4.588 4.59 4.593 4.596
  * y        (y) float64 51.4 51.4 51.39 51.39 51.39 ... 51.01 51.01 51.0 51.0
Data variables:
    crs      |S1 b''
    DEM      (t, y, x) float32 nan 1.219 1.458 1.349 ... 6.339 11.01 14.49 14.49
Attributes:
    Conventions:  CF-1.8
    institution:  openEO platform

and that (string) variable is blindly being dragged in as a "band", and conversion to float fails

@soxofaan
Copy link
Member Author

Test fail on py3.6, but pass on py3.7, 3.8 and 3.9: https://github.com/Open-EO/openeo-python-client/runs/6919448867 , e.g.

__________________ TestXarrayIO.test_from_netcdf_file_simple ___________________
self = <tests.udf.test_xarraydatacube.TestXarrayIO object at 0x7f9ab8182e10>
tmp_path = PosixPath('/tmp/pytest-of-runner/pytest-0/test_from_netcdf_file_simple0')
    def test_from_netcdf_file_simple(self, tmp_path):
        ...
        res = XarrayIO.from_netcdf_file(path)
>       assert res.coords["t"].values.tolist() == ["2020", "2021", "2022"]
E       AssertionError: assert [b'2020', b'2021', b'2022'] == ['2020', '2021', '2022']

It took some digging to figure out, but the problem has to do with a change in automatic string decoding in h5netcdf and h5py version 3.

the py3.6. run uses h5netcdf-1.0.0 and h5py-3.1.0
while the py3.7/3.8/3.9 runs use h5netcdf-1.0.0 and h5py-3.7.0

@soxofaan
Copy link
Member Author

ok, so it is more complex than that
Since h5py v3, byte data are not decoded to strings anymore by default
However, xarray since 0.17 overrides that default and (unless disabled) enforces string decoding again (pydata/xarray#4893)
For py3.6 env the last supported version of xarray is 0.16.2, so it's tricky to fix it there.

@soxofaan
Copy link
Member Author

I guess it's just easier to avoid hardcoding the h5netcdf engine here:

ds = xarray.open_dataset(path, engine='h5netcdf')

and just let xarray's defaults do their thing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant