Skip to content

BinGrouper: Support setting labels when provided with IntervalIndex #10259

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 28, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions xarray/groupers.py
Original file line number Diff line number Diff line change
Expand Up @@ -319,7 +319,7 @@ class BinGrouper(Grouper):
the resulting bins. If False, returns only integer indicators of the
bins. This affects the type of the output container (see below).
This argument is ignored when `bins` is an IntervalIndex. If True,
raises an error. When `ordered=False`, labels must be provided.
raises an error.
retbins : bool, default False
Whether to return the bins or not. Useful when bins is provided
as a scalar.
Expand Down Expand Up @@ -394,8 +394,13 @@ def factorize(self, group: T_Group) -> EncodedGroups:

# This seems silly, but it lets us have Pandas handle the complexity
# of `labels`, `precision`, and `include_lowest`, even when group is a chunked array
dummy, _ = self._cut(np.array([0]).astype(group.dtype))
full_index = dummy.categories
# Pandas ignores labels when IntervalIndex is passed
if not isinstance(self.bins, pd.IntervalIndex):
dummy, _ = self._cut(np.array([0]).astype(group.dtype))
full_index = dummy.categories
else:
full_index = pd.Index(self.labels)

if not by_is_chunked:
uniques = np.sort(pd.unique(codes.data.ravel()))
unique_values = full_index[uniques[uniques != -1]]
Expand Down
6 changes: 6 additions & 0 deletions xarray/tests/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -1062,6 +1062,12 @@ def test_groupby_bins_cut_kwargs(use_flox: bool) -> None:
).mean()
assert_identical(expected, actual)

with xr.set_options(use_flox=use_flox):
bins_index = pd.IntervalIndex.from_breaks(x_bins)
labels = ["one", "two", "three"]
actual = da.groupby(x=BinGrouper(bins=bins_index, labels=labels)).sum()
assert actual.xindexes["x_bins"].index.equals(pd.Index(labels)) # type: ignore[attr-defined]


@pytest.mark.parametrize("indexed_coord", [True, False])
@pytest.mark.parametrize(
Expand Down
Loading