Skip to content

Coordinate Variables have incorrect shapes and chunks if name matches a dimension #368

Closed
@mpiannucci

Description

@mpiannucci

FVCOM model data has a quirk where the siglay and siglev coordinate names match the siglay and siglev dimensions (See pydata/xarray#2233). XArray can now handle this but kerchunk wrongly assumes that if a variable matches the name of a dimension then it must have a single dimension matching its name.

Here is an example FVCOM dataset: https://noaa-ofs-pds.s3.amazonaws.com/ngofs2.20231003/nos.ngofs2.fields.f042.20231003.t09z.nc

Loading this in gives us the following for siglay and siglev:

Screenshot 2023-10-04 at 10 00 55 AM

Note that the siglay dimension is 40, siglev is 41, but the siglay variable has a shape of (40, 303714) and siglev has a shape of (41, 303714).

When we process this dataset with kerchunk the result is that the dimensions match but the shapes have become (40) and (41) respectively:

        "siglay\/.zarray": "{\"chunks\":[40],\"compressor\":null,\"dtype\":\">f4\",\"fill_value\":null,\"filters\":null,\"order\":\"C\",\"shape\":[40],\"zarr_format\":2}",
        "siglay\/0": [
            ".\/nos.ngofs2.fields.f042.20231003.t09z.nc",
            16258148,
            48594240
        ],
        "siglay\/.zattrs": "{\"_ARRAY_DIMENSIONS\":[\"siglay\",\"node\"],\"formula_terms\":\"sigma: siglay eta: zeta depth: h\",\"long_name\":\"Sigma Layers\",\"positive\":\"up\",\"standard_name\":\"ocean_sigma\\\/general_coordinate\",\"valid_max\":\"0.0\",\"valid_min\":\"-1.0\"}",
        "siglev\/.zarray": "{\"chunks\":[41],\"compressor\":null,\"dtype\":\">f4\",\"fill_value\":null,\"filters\":null,\"order\":\"C\",\"shape\":[41],\"zarr_format\":2}",
        "siglev\/0": [
            ".\/nos.ngofs2.fields.f042.20231003.t09z.nc",
            64852388,
            49809096
        ],
        "siglev\/.zattrs": "{\"_ARRAY_DIMENSIONS\":[\"siglev\",\"node\"],\"formula_terms\":\"sigma:siglay eta: zeta depth: h\",\"long_name\":\"Sigma Levels\",\"positive\":\"up\",\"standard_name\":\"ocean_sigma\\\/general_coordinate\",\"valid_max\":\"0.0\",\"valid_min\":\"-1.0\"}",

We can also note that the zarr chunk sizes are accurate if we calculate the number of datapoints from the number of bytes, so its simply kerchunk assuming the coordinate matches the dimension.

PR to follow

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @mpiannucci

        Issue actions

          Coordinate Variables have incorrect shapes and chunks if name matches a dimension · Issue #368 · fsspec/kerchunk