Description
FVCOM model data has a quirk where the siglay
and siglev
coordinate names match the siglay
and siglev
dimensions (See pydata/xarray#2233). XArray can now handle this but kerchunk wrongly assumes that if a variable matches the name of a dimension then it must have a single dimension matching its name.
Here is an example FVCOM dataset: https://noaa-ofs-pds.s3.amazonaws.com/ngofs2.20231003/nos.ngofs2.fields.f042.20231003.t09z.nc
Loading this in gives us the following for siglay
and siglev
:
Note that the siglay
dimension is 40
, siglev
is 41
, but the siglay
variable has a shape of (40, 303714)
and siglev
has a shape of (41, 303714)
.
When we process this dataset with kerchunk the result is that the dimensions match but the shapes have become (40)
and (41)
respectively:
"siglay\/.zarray": "{\"chunks\":[40],\"compressor\":null,\"dtype\":\">f4\",\"fill_value\":null,\"filters\":null,\"order\":\"C\",\"shape\":[40],\"zarr_format\":2}",
"siglay\/0": [
".\/nos.ngofs2.fields.f042.20231003.t09z.nc",
16258148,
48594240
],
"siglay\/.zattrs": "{\"_ARRAY_DIMENSIONS\":[\"siglay\",\"node\"],\"formula_terms\":\"sigma: siglay eta: zeta depth: h\",\"long_name\":\"Sigma Layers\",\"positive\":\"up\",\"standard_name\":\"ocean_sigma\\\/general_coordinate\",\"valid_max\":\"0.0\",\"valid_min\":\"-1.0\"}",
"siglev\/.zarray": "{\"chunks\":[41],\"compressor\":null,\"dtype\":\">f4\",\"fill_value\":null,\"filters\":null,\"order\":\"C\",\"shape\":[41],\"zarr_format\":2}",
"siglev\/0": [
".\/nos.ngofs2.fields.f042.20231003.t09z.nc",
64852388,
49809096
],
"siglev\/.zattrs": "{\"_ARRAY_DIMENSIONS\":[\"siglev\",\"node\"],\"formula_terms\":\"sigma:siglay eta: zeta depth: h\",\"long_name\":\"Sigma Levels\",\"positive\":\"up\",\"standard_name\":\"ocean_sigma\\\/general_coordinate\",\"valid_max\":\"0.0\",\"valid_min\":\"-1.0\"}",
We can also note that the zarr chunk sizes are accurate if we calculate the number of datapoints from the number of bytes, so its simply kerchunk assuming the coordinate matches the dimension.
PR to follow
Activity