Skip to content

Commit d385e20

Browse files
Illviljandcherian
andauthored
Avoid in-memory broadcasting when converting to_dask_dataframe (#7472)
* Avoid in-memory broadcasting when converting to dask_dataframe * Update dataset.py * Update xarray/core/dataset.py Co-authored-by: Deepak Cherian <[email protected]> * Update whats-new.rst * remove ravel_chunks * Update dataset.py Co-authored-by: Deepak Cherian <[email protected]>
1 parent 97d7e76 commit d385e20

File tree

2 files changed

+8
-0
lines changed

2 files changed

+8
-0
lines changed

doc/whats-new.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,9 @@ Breaking changes
8585
Bug fixes
8686
~~~~~~~~~
8787

88+
- Avoid in-memory broadcasting when converting to a dask dataframe
89+
using ``.to_dask_dataframe.`` (:issue:`6811`, :pull:`7472`).
90+
By `Jimmy Westling <https://github.com/illviljan>`_.
8891
- Accessing the property ``.nbytes`` of a DataArray, or Variable no longer
8992
accidentally triggers loading the variable into memory.
9093
- Allow numpy-only objects in :py:func:`where` when ``keep_attrs=True`` (:issue:`7362`, :pull:`7364`).

xarray/core/dataset.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6403,6 +6403,11 @@ def to_dask_dataframe(
64036403
if isinstance(var, IndexVariable):
64046404
var = var.to_base_variable()
64056405

6406+
# Make sure var is a dask array, otherwise the array can become too large
6407+
# when it is broadcasted to several dimensions:
6408+
if not is_duck_dask_array(var._data):
6409+
var = var.chunk()
6410+
64066411
dask_array = var.set_dims(ordered_dims).chunk(self.chunks).data
64076412
series = dd.from_array(dask_array.reshape(-1), columns=[name])
64086413
series_list.append(series)

0 commit comments

Comments
 (0)