Skip to content

Problem opening Med-CORDEX OpenDAP files #3754

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
juanfcocontreras opened this issue Feb 5, 2020 · 5 comments
Open

Problem opening Med-CORDEX OpenDAP files #3754

juanfcocontreras opened this issue Feb 5, 2020 · 5 comments

Comments

@juanfcocontreras
Copy link

juanfcocontreras commented Feb 5, 2020

MCVE Code Sample

To obtain the files, it is necessary to request a username and password. These details are freely available for research purposes with prior authorization (https://www.medcordex.eu/medcordex_register.php). They must be entered in the list of urls.

import xarray as xr

urls = ['https://user:[email protected]:8290/medcordex/dodsC/MEDCORDEX/MED-11/CNRM/CNRM-CM5/historical/r8i1p1/CNRM-ALADIN52/v1/day/rsds/rsds_MED-11_CNRM-CM5_historical_r8i1p1_CNRM-ALADIN52_v1_day_20010101-20051231.nc',
 'https://user:[email protected]:8290/medcordex/dodsC/MEDCORDEX/MED-11/CNRM/CNRM-CM5/historical/r8i1p1/CNRM-ALADIN52/v1/day/rsds/rsds_MED-11_CNRM-CM5_historical_r8i1p1_CNRM-ALADIN52_v1_day_19960101-20001231.nc']

ds = xr.open_mfdataset(urls, combine="by_coords", decode_cf=False, drop_variables="Lambert_Conformal")

ds_subset = ds.where(
            (ds.lon > -3.74)
            & (ds.lon < -3.14)
            & (ds.lat > 36.73)
            & (ds.lat < 37.09),
            drop=True,
        )

ds_subset.to_netcdf("test.nc")

Expected Output

A file called "test.nc" is created with the selection data. This happens if I download the netCDF files locally, but not using OpenDAP.

Problem Description

The data variable rsds cannot be converted to nparray, and the file cannot be saved. The error occurs when running:

np.asarray(ds_subset.rsds)

Error message:

RuntimeError: NetCDF: Access failure
Full traceback
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
~/Desktop/medcordex/download.py in 
----> 1 ds_subset.to_netcdf(filename)

/usr/local/Caskroom/miniconda/base/envs/conda-forge/lib/python3.7/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf)
   1543             unlimited_dims=unlimited_dims,
   1544             compute=compute,
-> 1545             invalid_netcdf=invalid_netcdf,
   1546         )
   1547 

/usr/local/Caskroom/miniconda/base/envs/conda-forge/lib/python3.7/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf)
   1095             return writer, store
   1096 
-> 1097         writes = writer.sync(compute=compute)
   1098 
   1099         if path_or_file is None:

/usr/local/Caskroom/miniconda/base/envs/conda-forge/lib/python3.7/site-packages/xarray/backends/common.py in sync(self, compute)
    202                 compute=compute,
    203                 flush=True,
--> 204                 regions=self.regions,
    205             )
    206             self.sources = []

/usr/local/Caskroom/miniconda/base/envs/conda-forge/lib/python3.7/site-packages/dask/array/core.py in store(sources, targets, lock, regions, compute, return_stored, **kwargs)
    921 
    922         if compute:
--> 923             result.compute(**kwargs)
    924             return None
    925         else:

/usr/local/Caskroom/miniconda/base/envs/conda-forge/lib/python3.7/site-packages/dask/base.py in compute(self, **kwargs)
    163         dask.base.compute
    164         """
--> 165         (result,) = compute(self, traverse=False, **kwargs)
    166         return result
    167 

/usr/local/Caskroom/miniconda/base/envs/conda-forge/lib/python3.7/site-packages/dask/base.py in compute(*args, **kwargs)
    434     keys = [x.__dask_keys__() for x in collections]
    435     postcomputes = [x.__dask_postcompute__() for x in collections]
--> 436     results = schedule(dsk, keys, **kwargs)
    437     return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
    438 

/usr/local/Caskroom/miniconda/base/envs/conda-forge/lib/python3.7/site-packages/dask/threaded.py in get(dsk, result, cache, num_workers, pool, **kwargs)
     79         get_id=_thread_get_id,
     80         pack_exception=pack_exception,
---> 81         **kwargs
     82     )
     83 

/usr/local/Caskroom/miniconda/base/envs/conda-forge/lib/python3.7/site-packages/dask/local.py in get_async(apply_async, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, **kwargs)
    484                         _execute_task(task, data)  # Re-execute locally
    485                     else:
--> 486                         raise_exception(exc, tb)
    487                 res, worker_id = loads(res_info)
    488                 state["cache"][key] = res

/usr/local/Caskroom/miniconda/base/envs/conda-forge/lib/python3.7/site-packages/dask/local.py in reraise(exc, tb)
    314     if exc.__traceback__ is not tb:
    315         raise exc.with_traceback(tb)
--> 316     raise exc
    317 
    318 

/usr/local/Caskroom/miniconda/base/envs/conda-forge/lib/python3.7/site-packages/dask/local.py in execute_task(key, task_info, dumps, loads, get_id, pack_exception)
    220     try:
    221         task, data = loads(task_info)
--> 222         result = _execute_task(task, data)
    223         id = get_id()
    224         result = dumps((result, id))

/usr/local/Caskroom/miniconda/base/envs/conda-forge/lib/python3.7/site-packages/dask/core.py in _execute_task(arg, cache, dsk)
    117         func, args = arg[0], arg[1:]
    118         args2 = [_execute_task(a, cache) for a in args]
--> 119         return func(*args2)
    120     elif not ishashable(arg):
    121         return arg

/usr/local/Caskroom/miniconda/base/envs/conda-forge/lib/python3.7/site-packages/dask/array/core.py in getter(a, b, asarray, lock)
    104         c = a[b]
    105         if asarray:
--> 106             c = np.asarray(c)
    107     finally:
    108         if lock:

/usr/local/Caskroom/miniconda/base/envs/conda-forge/lib/python3.7/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     83 
     84     """
---> 85     return array(a, dtype, copy=False, order=order)
     86 
     87 

/usr/local/Caskroom/miniconda/base/envs/conda-forge/lib/python3.7/site-packages/xarray/core/indexing.py in __array__(self, dtype)
    489 
    490     def __array__(self, dtype=None):
--> 491         return np.asarray(self.array, dtype=dtype)
    492 
    493     def __getitem__(self, key):

/usr/local/Caskroom/miniconda/base/envs/conda-forge/lib/python3.7/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     83 
     84     """
---> 85     return array(a, dtype, copy=False, order=order)
     86 
     87 

/usr/local/Caskroom/miniconda/base/envs/conda-forge/lib/python3.7/site-packages/xarray/core/indexing.py in __array__(self, dtype)
    651 
    652     def __array__(self, dtype=None):
--> 653         return np.asarray(self.array, dtype=dtype)
    654 
    655     def __getitem__(self, key):

/usr/local/Caskroom/miniconda/base/envs/conda-forge/lib/python3.7/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     83 
     84     """
---> 85     return array(a, dtype, copy=False, order=order)
     86 
     87 

/usr/local/Caskroom/miniconda/base/envs/conda-forge/lib/python3.7/site-packages/xarray/core/indexing.py in __array__(self, dtype)
    555     def __array__(self, dtype=None):
    556         array = as_indexable(self.array)
--> 557         return np.asarray(array[self.key], dtype=None)
    558 
    559     def transpose(self, order):

/usr/local/Caskroom/miniconda/base/envs/conda-forge/lib/python3.7/site-packages/xarray/backends/netCDF4_.py in __getitem__(self, key)
     71     def __getitem__(self, key):
     72         return indexing.explicit_indexing_adapter(
---> 73             key, self.shape, indexing.IndexingSupport.OUTER, self._getitem
     74         )
     75 

/usr/local/Caskroom/miniconda/base/envs/conda-forge/lib/python3.7/site-packages/xarray/core/indexing.py in explicit_indexing_adapter(key, shape, indexing_support, raw_indexing_method)
    835     """
    836     raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support)
--> 837     result = raw_indexing_method(raw_key.tuple)
    838     if numpy_indices.tuple:
    839         # index the loaded np.ndarray

/usr/local/Caskroom/miniconda/base/envs/conda-forge/lib/python3.7/site-packages/xarray/backends/netCDF4_.py in _getitem(self, key)
     83             with self.datastore.lock:
     84                 original_array = self.get_array(needs_lock=False)
---> 85                 array = getitem(original_array, key)
     86         except IndexError:
     87             # Catch IndexError in netCDF4 and return a more informative

/usr/local/Caskroom/miniconda/base/envs/conda-forge/lib/python3.7/site-packages/xarray/backends/common.py in robust_getitem(array, key, catch, max_retries, initial_delay)
     52     for n in range(max_retries + 1):
     53         try:
---> 54             return array[key]
     55         except catch:
     56             if n == max_retries:

netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.__getitem__()

netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable._get()

netCDF4/_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success()

RuntimeError: NetCDF: Access failure

Output of xr.show_versions()

# Paste the output here xr.show_versions() here

INSTALLED VERSIONS

commit: None
python: 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 22:05:27)
[Clang 9.0.1 ]
python-bits: 64
OS: Darwin
OS-release: 19.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: es_ES.UTF-8
LANG: es_ES.UTF-8
LOCALE: es_ES.UTF-8
libhdf5: 1.10.5
libnetcdf: 4.7.3

xarray: 0.15.0
pandas: 1.0.0
numpy: 1.18.1
scipy: 1.4.1
netCDF4: 1.5.3
pydap: installed
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: None
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.1.2
cfgrib: None
iris: None
bottleneck: None
dask: 2.10.1
distributed: 2.10.0
matplotlib: 3.1.3
cartopy: 0.17.0
seaborn: 0.10.0
numbagg: None
setuptools: 45.1.0.post20200119
pip: 20.0.2
conda: 4.8.2
pytest: 5.3.5
IPython: 7.12.0
sphinx: None

@dcherian
Copy link
Contributor

dcherian commented Feb 5, 2020

Can you try the steps in the authentication section under https://xarray.pydata.org/en/stable/io.html#opendap?

@juanfcocontreras
Copy link
Author

juanfcocontreras commented Feb 5, 2020

Hello @dcherian,

Thanks for your quick reply!

I don't have any authentication issue accesing the dataset, the problem is when I try to access rsds data.

print(ds_subset)
<xarray.Dataset>
Dimensions:  (time: 1826, x: 5, y: 4)
Coordinates:
  * time     (time) float64 4.478e+05 4.478e+05 ... 4.916e+05 4.916e+05
Dimensions without coordinates: x, y
Data variables:
    lon      (y, x) float64 dask.array<chunksize=(4, 5), meta=np.ndarray>
    lat      (y, x) float64 dask.array<chunksize=(4, 5), meta=np.ndarray>
    rsds     (time, y, x) float32 dask.array<chunksize=(1826, 4, 5), meta=np.ndarray>
Attributes:
    Conventions:                     CF-1.6
    contact:                         [email protected]
    comment:                         CORDEX Mediterranean ALADIN5.2 0.11 deg ...
    experiment_id:                   historical
    experiment:                      Historical run with GCM forcing
    driving_experiment:              CNRM-CM5, historical, r8i1p1
    driving_model_id:                CNRM-CM5
    driving_model_ensemble_member:   r8i1p1
    driving_experiment_name:         historical
    frequency:                       month
    institute_id:                    CNRM
    institution:                     Centre National de Recherches Meteorolog...
    model_id:                        CNRM-ALADIN52
    project_id:                      CORDEX
    CORDEX_domain:                   MED-11
    RCM_version_id:                  v1
    product:                         output
    references:                      http://www.cnrm-game.fr/spip.php?rubriqu...
    filename:                        rsds_MED-11_CNRM-CM5_historical_r8i1p1_C...
    creation_date:                   2012-12-06 15:18:55
    history:                         Mon Dec 10 11:11:54 2012: ncrcat rsds_ME...
    nco_openmp_thread_number:        1
    DODS.strlen:                     80
    DODS.dimName:                    string80
    DODS_EXTRA.Unlimited_Dimension:  time

print(ds_subset.lat)
<xarray.DataArray 'lat' (y: 4, x: 5)>
dask.array<where, shape=(4, 5), dtype=float64, chunksize=(4, 5), chunktype=numpy.ndarray>
Dimensions without coordinates: y, x
Attributes:
    standard_name:        latitude
    long_name:            latitude
    units:                degrees_north
    _CoordinateAxisType:  Lat

print(ds_subset.rsds)
<xarray.DataArray 'rsds' (time: 1826, y: 4, x: 5)>
dask.array<where, shape=(1826, 4, 5), dtype=float32, chunksize=(1826, 4, 5), chunktype=numpy.ndarray>
Coordinates:
  * time     (time) float64 4.478e+05 4.478e+05 ... 4.916e+05 4.916e+05
Dimensions without coordinates: y, x
Attributes:
    standard_name:  surface_downwelling_shortwave_flux_in_air
    long_name:      Surface Downwelling Shortwave Radiation
    units:          W m-2
    cell_methods:   time: mean
    original_name:  SURFRAYT SOLA DE
    coordinates:    lon lat
    missing_value:  1e+20
    _FillValue:     1e+20

print(ds_subset.lat.values)
[[36.73249817 36.75699997 36.78150177 36.80569839 36.82979965]
 [36.84149933 36.86610031 36.8905983  36.91479874 36.93889999]
 [        nan 36.97520065 36.99969864 37.02399826 37.04809952]
 [        nan 37.08420181         nan         nan         nan]]

print(ds_subset.rsds.values)
[...]

netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.__getitem__()

netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable._get()

netCDF4/_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success()

RuntimeError: NetCDF: Access failure

@juanfcocontreras
Copy link
Author

Hello @dcherian again,

Although I didn't seem to have any problems with authentication, I tried to use authentication with requests that you recommended in the documentation link.

It seems that this way I can access data without any problem.

My question now is that I don't know how to load multiple files into a single dataset without using a loop:

store = xr.backends.PydapDataStore.open('http://example.com/data',
                                        session=session)
ds = xr.open_dataset(store)

That is, is there any equivalent to open_mfdatasets with xr.backends.PydapDataStore?

@juanfcocontreras
Copy link
Author

juanfcocontreras commented Feb 6, 2020

I answer myself just in case it might be helpful to other people:

stores = []
for dataset in datasets:
    store = xr.backends.PydapDataStore.open(
        dataset.url, session=session
    )
    stores.append(store)
    
ds = xr.open_mfdataset(stores, combine="by_coords")

I close this issue and thanks @dcherian for your help!

@dcherian
Copy link
Contributor

dcherian commented Feb 6, 2020

@juanfcocontreras That open_mfdataset example would make a nice addition to the documentation in the OpenDAP section. Do you have time to send in a PR? We have documentation on contributing here: https://xarray.pydata.org/en/stable/contributing.html#contributing-to-the-documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants