Skip to content

Unexpected NaNs in broadcast #7385

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
4 tasks done
dopplershift opened this issue Dec 16, 2022 · 4 comments
Open
4 tasks done

Unexpected NaNs in broadcast #7385

dopplershift opened this issue Dec 16, 2022 · 4 comments

Comments

@dopplershift
Copy link
Contributor

What happened?

When running the broadcast in the sample code, I end up with nan in the output when there are not any in the original source array. While I know the construction is really odd (this came from user-submitted code), I'm shocked that it resulted in nans the resulting broadcasted data and honestly assumed MetPy's code was doing something dumb for quite awhile. I would have expected (regardless of the nature of the coordinates) that the result for broad_a be [[1, 2], [1, 2]].

What did you expect to happen?

No response

Minimal Complete Verifiable Example

levs = np.array([100000, 85000])
a = xr.Dataset({'a': (('lev',), [1, 2])}, coords={'lev': levs}).to_array()
b = xr.Dataset({'b': (('lev',), [3, 4])}, coords={'lev': levs}).to_array()

broad_a, broad_b = xr.broadcast(a, b)
print(broad_a)

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

<xarray.DataArray (variable: 2, lev: 2)>
array([[ 1.,  2.],
       [nan, nan]])
Coordinates:
  * lev       (lev) int64 100000 85000
  * variable  (variable) object 'a' 'b'

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:31:57) [Clang 14.0.6 ] python-bits: 64 OS: Darwin OS-release: 21.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.8.1

xarray: 2022.12.0
pandas: 1.5.2
numpy: 1.23.5
scipy: 1.9.3
netCDF4: 1.6.2
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.13.3
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: 0.9.10.3
iris: None
bottleneck: 1.3.5
dask: 2022.6.1
distributed: 2022.6.1
matplotlib: 3.6.2
cartopy: 0.21.0
seaborn: None
numbagg: None
fsspec: 2022.11.0
cupy: None
pint: 0.20.1
sparse: None
flox: None
numpy_groupies: None
setuptools: 65.5.1
pip: 22.3.1
conda: None
pytest: 7.2.0
mypy: 0.991
IPython: 8.7.0
sphinx: 5.3.0

@dopplershift dopplershift added bug needs triage Issue that has not been reviewed by xarray team member labels Dec 16, 2022
@dcherian
Copy link
Contributor

to_array is adding a new dimension variable with values a, b respectively.

Now when you align these, NaNs are inserted. I would insert a squeeze after to_array()

@dcherian dcherian added usage question and removed bug needs triage Issue that has not been reviewed by xarray team member labels Dec 20, 2022
@headtr1ck
Copy link
Collaborator

@dopplershift does this answer fix your problem?

@dopplershift
Copy link
Contributor Author

@dcherian Is this behavior (filling with fill_value -> inserting Nans) because they share common dimensionality in terms of name, but have different coordinate values? My expectation was something that operated more like numpy broadcasting (repeating values, not filling with anything else).

I can understand how xarray's data model yields this behavior, but in that case it might be good to improve the docs for xarray.broadcast, because it says nothing about the behavior that (seem to me) mimics xarray.align.

@dcherian
Copy link
Contributor

Is this behavior (filling with fill_value -> inserting Nans) because they share common dimensionality in terms of name, but have different coordinate values?

Yes broadcasting is doing alignment with outer join by default: #6304. This is conceptually pretty confusing.

I agree we should document this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants