Skip to content

Coordinate Variables have incorrect shapes and chunks if name matches a dimension #368

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mpiannucci opened this issue Oct 4, 2023 · 0 comments

Comments

@mpiannucci
Copy link
Contributor

FVCOM model data has a quirk where the siglay and siglev coordinate names match the siglay and siglev dimensions (See pydata/xarray#2233). XArray can now handle this but kerchunk wrongly assumes that if a variable matches the name of a dimension then it must have a single dimension matching its name.

Here is an example FVCOM dataset: https://noaa-ofs-pds.s3.amazonaws.com/ngofs2.20231003/nos.ngofs2.fields.f042.20231003.t09z.nc

Loading this in gives us the following for siglay and siglev:

Screenshot 2023-10-04 at 10 00 55 AM

Note that the siglay dimension is 40, siglev is 41, but the siglay variable has a shape of (40, 303714) and siglev has a shape of (41, 303714).

When we process this dataset with kerchunk the result is that the dimensions match but the shapes have become (40) and (41) respectively:

        "siglay\/.zarray": "{\"chunks\":[40],\"compressor\":null,\"dtype\":\">f4\",\"fill_value\":null,\"filters\":null,\"order\":\"C\",\"shape\":[40],\"zarr_format\":2}",
        "siglay\/0": [
            ".\/nos.ngofs2.fields.f042.20231003.t09z.nc",
            16258148,
            48594240
        ],
        "siglay\/.zattrs": "{\"_ARRAY_DIMENSIONS\":[\"siglay\",\"node\"],\"formula_terms\":\"sigma: siglay eta: zeta depth: h\",\"long_name\":\"Sigma Layers\",\"positive\":\"up\",\"standard_name\":\"ocean_sigma\\\/general_coordinate\",\"valid_max\":\"0.0\",\"valid_min\":\"-1.0\"}",
        "siglev\/.zarray": "{\"chunks\":[41],\"compressor\":null,\"dtype\":\">f4\",\"fill_value\":null,\"filters\":null,\"order\":\"C\",\"shape\":[41],\"zarr_format\":2}",
        "siglev\/0": [
            ".\/nos.ngofs2.fields.f042.20231003.t09z.nc",
            64852388,
            49809096
        ],
        "siglev\/.zattrs": "{\"_ARRAY_DIMENSIONS\":[\"siglev\",\"node\"],\"formula_terms\":\"sigma:siglay eta: zeta depth: h\",\"long_name\":\"Sigma Levels\",\"positive\":\"up\",\"standard_name\":\"ocean_sigma\\\/general_coordinate\",\"valid_max\":\"0.0\",\"valid_min\":\"-1.0\"}",

We can also note that the zarr chunk sizes are accurate if we calculate the number of datapoints from the number of bytes, so its simply kerchunk assuming the coordinate matches the dimension.

PR to follow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant