Skip to content

Fix aligned index variable metadata side effect #6857

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 31, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions xarray/core/alignment.py
Original file line number Diff line number Diff line change
Expand Up @@ -467,7 +467,7 @@ def override_indexes(self) -> None:
if obj_idx is not None:
for name, var in self.aligned_index_vars[key].items():
new_indexes[name] = aligned_idx
new_variables[name] = var
new_variables[name] = var.copy()

objects[i + 1] = obj._overwrite_indexes(new_indexes, new_variables)

Expand Down Expand Up @@ -507,7 +507,7 @@ def _get_indexes_and_vars(
if obj_idx is not None:
for name, var in index_vars.items():
new_indexes[name] = aligned_idx
new_variables[name] = var
new_variables[name] = var.copy()

return new_indexes, new_variables

Expand Down
12 changes: 12 additions & 0 deletions xarray/tests/test_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -2333,6 +2333,18 @@ def test_align_str_dtype(self) -> None:
assert_identical(expected_b, actual_b)
assert expected_b.x.dtype == actual_b.x.dtype

@pytest.mark.parametrize("join", ["left", "override"])
def test_align_index_var_attrs(self, join) -> None:
# regression test https://github.com/pydata/xarray/issues/6852

ds = Dataset(coords={"x": ("x", [1, 2, 3], {"units": "m"})})
ds_noattr = Dataset(coords={"x": ("x", [1, 2, 3])})

actual_noattr, actual = xr.align(ds_noattr, ds, join=join)
Copy link
Collaborator

@keewis keewis Aug 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to also try this the other way round? As far as I understand #6852, the order mattered?

Edit: changing the order results in actual_noattr.x.attrs to be equal to actual.x.attrs.

This is probably for a separate issue, but I'm not sure what to do with differing attrs on aligned coordinates: should they stay the same as before the align, or should we use merge_attrs? At the moment, the result is hard-coded to "override", which might not be desirable.

Copy link
Collaborator

@keewis keewis Aug 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I might be misunderstanding something, but doesn't this check a different issue? The point of #6852 was not that the result was wrong, but that one of the operands was modified. To check that, we'd need different asserts:

        assert ds.x.attrs == {"units": "m"}
        assert ds_noattr.x.attrs == {}

(this PR still fixes that issue, though)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of #6852 was not that the result was wrong, but that one of the operands was modified.

Yes, you're right! The asserts you suggest are the right ones. I'll update it.

would it make sense to also try this the other way round?

No I think it would be redundant. With xr.align(ds, ds_noattr), ds.x has no attribute anymore and with xr.align(ds_noattr, ds), ds_noattr.x has a new attribute "units", so it's different but the side effect is the same (i.e., unwanted update of the attributes of the "x" coordinate of the 1st object, which is here picked up as the aligned "x" coordinate variable).

Copy link
Collaborator

@keewis keewis Aug 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I think it would be redundant

that was referring to the attrs of the resulting datasets (i.e. unrelated to #6852) so I agree that that would be redundant. I'll open a new issue to discuss my question.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the issue with the attrs in the resulting datasets? The asserts on the results all pass with and without this fix:

assert actual.x.attrs == {"units": "m"}
assert actual_noattr.x.attrs == {}

Copy link
Member Author

@benbovy benbovy Aug 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm that's weird, the example in your last comment fails both in main and this PR for me.

Copy link
Collaborator

@keewis keewis Aug 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if that's a environment issue, but this is what I've been using to test:

import pytest
import xarray as xr


@pytest.mark.parametrize("join", ["left", "override"])
def test_align_index_var_attrs(join) -> None:
    ds = xr.Dataset(coords={"x": ("x", [1, 2, 3], {"units": "m"})})
    ds_noattr = xr.Dataset(coords={"x": ("x", [1, 2, 3])})
    actual, actual_noattr = xr.align(ds, ds_noattr, join=join)
    assert ds.x.attrs == {"units": "m"}
    assert ds_noattr.x.attrs == {}
    assert actual.x.attrs == {"units": "m"}
    assert actual_noattr.x.attrs == {}

join="left" fails on main because ds.x.attrs is cleared, but join="override" fails with:

________________________ test_align_index_var_attrs[override] ________________________

join = 'override'

    @pytest.mark.parametrize("join", ["left", "override"])
    def test_align_index_var_attrs(join) -> None:
        ds = xr.Dataset(coords={"x": ("x", [1, 2, 3], {"units": "m"})})
        ds_noattr = xr.Dataset(coords={"x": ("x", [1, 2, 3])})
        actual, actual_noattr = xr.align(ds, ds_noattr, join=join)
        assert ds.x.attrs == {"units": "m"}
        assert ds_noattr.x.attrs == {}
        assert actual.x.attrs == {"units": "m"}
>       assert actual_noattr.x.attrs == {}
E       AssertionError: assert {'units': 'm'} == {}
E         Left contains 1 more item:
E         {'units': 'm'}
E         Use -v to get more diff

.../test_issue6852.py:13: AssertionError

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably misunderstanding what align with join="override" is doing, though

Copy link
Member Author

@benbovy benbovy Aug 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes that makes sense: we need to clarify what join="override" means. Does it mean only override the index or does it mean override the index variables (i.e., their metadata) too? Currently it's the latter but I'm not sure this is what we want. Then indeed it is probably worth opening an issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking more about it, I think it is a bug. align shouldn't update any metadata, even with join="override". I opened #6860.


assert actual.x.attrs == {"units": "m"}
assert actual_noattr.x.attrs == {}

def test_broadcast(self) -> None:
ds = Dataset(
{"foo": 0, "bar": ("x", [1]), "baz": ("y", [2, 3])}, {"c": ("x", [4])}
Expand Down