You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When opening MUR SST netcdfs from S3, xarray.open_dataset(file, engine="h5netcdf", chunks={}) returns a single chunk (whereas the h5netcdf library returns a chunk shape of (1, 1023, 2047).
I thought the chunks={} option would return the same chunks (1, 1023, 2047) exposed by the h5netcdf engine.
Minimal Complete Verifiable Example
#!/usr/bin/env python# coding: utf-8# This notebook looks at how xarray and h5netcdf return different chunks.importpandasaspdimporth5netcdfimports3fsimportxarrayasxrdates= [
d.to_pydatetime().strftime('%Y%m%d')
fordinpd.date_range('2023-02-01', '2023-03-01', freq='D')
]
SHORT_NAME='MUR-JPL-L4-GLOB-v4.1's3_fs=s3fs.S3FileSystem(anon=False)
var='analysed_sst'defmake_filename(time):
base_url=f's3://podaac-ops-cumulus-protected/{SHORT_NAME}/'# example file: "/20020601090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc"returnf'{base_url}{time}090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc's3_urls= [make_filename(d) fordindates]
defprint_chunk_shape(s3_url):
try:
# Open the dataset using xarrayfile=s3_fs.open(s3_url)
dataset=xr.open_dataset(file, engine='h5netcdf', chunks={})
# Print chunk shapes for each variable in the datasetprint(f"\nChunk shapes for {s3_url}:")
ifdataset[var].chunksisnotNone:
print(f"xarray open_dataset chunks for {var}: {dataset[var].chunks}")
else:
print(f"xarray open_dataset chunks for {var}: Not chunked")
withh5netcdf.File(file, 'r') asfile:
dataset=file[var]
# Check if the dataset is chunkedifdataset.chunks:
print(f"h5netcdf chunks for {var}:", dataset.chunks)
else:
print(f"h5netcdf dataset is not chunked.")
exceptExceptionase:
print(f"Failed to process {s3_url}: {e}")
[print_chunk_shape(s3_url) fors3_urlins3_urls]
MVCE confirmation
Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
Complete example — the example is self-contained, including all data and the text of any traceback.
Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
New issue — a search of GitHub Issues suggests this is not a duplicate.
Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
No response
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None
python: 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 5.10.198-187.748.amzn2.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C.UTF-8
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.1
libnetcdf: 4.9.2
it has been implemented in #7948 for h5netcdf, but you're using an old version of xarray (2023.6.0). Might need to upgrade to a newer version of xarray (at least v2023.09.0 or newer).
What happened?
When opening MUR SST netcdfs from S3, xarray.open_dataset(file, engine="h5netcdf", chunks={}) returns a single chunk (whereas the h5netcdf library returns a chunk shape of (1, 1023, 2047).
A notebook version of the code below includes the output: https://gist.github.com/abarciauskas-bgse/9366e04d2af09b79c9de466f6c1d3b90
What did you expect to happen?
I thought the chunks={} option would return the same chunks (1, 1023, 2047) exposed by the h5netcdf engine.
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
No response
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None
python: 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 5.10.198-187.748.amzn2.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C.UTF-8
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.1
libnetcdf: 4.9.2
xarray: 2023.6.0
pandas: 2.0.3
numpy: 1.24.4
scipy: 1.11.1
netCDF4: 1.6.4
pydap: installed
h5netcdf: 1.2.0
h5py: 3.9.0
Nio: None
zarr: 2.15.0
cftime: 1.6.2
nc_time_axis: 1.4.1
PseudoNetCDF: None
iris: None
bottleneck: 1.3.7
dask: 2023.6.1
distributed: 2023.6.1
matplotlib: 3.7.1
cartopy: 0.21.1
seaborn: 0.12.2
numbagg: None
fsspec: 2023.6.0
cupy: None
pint: 0.22
sparse: 0.14.0
flox: 0.7.2
numpy_groupies: 0.9.22
setuptools: 68.0.0
pip: 23.1.2
conda: None
pytest: 7.4.0
mypy: None
IPython: 8.14.0
sphinx: None
The text was updated successfully, but these errors were encountered: