Skip to content

Fix aligned index variable metadata side effect #6857

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 31, 2022

Conversation

benbovy
Copy link
Member

@benbovy benbovy commented Aug 1, 2022

Copy link
Collaborator

@keewis keewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the quick fix, @benbovy.

ds = Dataset(coords={"x": ("x", [1, 2, 3], {"units": "m"})})
ds_noattr = Dataset(coords={"x": ("x", [1, 2, 3])})

actual_noattr, actual = xr.align(ds_noattr, ds, join=join)
Copy link
Collaborator

@keewis keewis Aug 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to also try this the other way round? As far as I understand #6852, the order mattered?

Edit: changing the order results in actual_noattr.x.attrs to be equal to actual.x.attrs.

This is probably for a separate issue, but I'm not sure what to do with differing attrs on aligned coordinates: should they stay the same as before the align, or should we use merge_attrs? At the moment, the result is hard-coded to "override", which might not be desirable.

Copy link
Collaborator

@keewis keewis Aug 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I might be misunderstanding something, but doesn't this check a different issue? The point of #6852 was not that the result was wrong, but that one of the operands was modified. To check that, we'd need different asserts:

        assert ds.x.attrs == {"units": "m"}
        assert ds_noattr.x.attrs == {}

(this PR still fixes that issue, though)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of #6852 was not that the result was wrong, but that one of the operands was modified.

Yes, you're right! The asserts you suggest are the right ones. I'll update it.

would it make sense to also try this the other way round?

No I think it would be redundant. With xr.align(ds, ds_noattr), ds.x has no attribute anymore and with xr.align(ds_noattr, ds), ds_noattr.x has a new attribute "units", so it's different but the side effect is the same (i.e., unwanted update of the attributes of the "x" coordinate of the 1st object, which is here picked up as the aligned "x" coordinate variable).

Copy link
Collaborator

@keewis keewis Aug 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I think it would be redundant

that was referring to the attrs of the resulting datasets (i.e. unrelated to #6852) so I agree that that would be redundant. I'll open a new issue to discuss my question.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the issue with the attrs in the resulting datasets? The asserts on the results all pass with and without this fix:

assert actual.x.attrs == {"units": "m"}
assert actual_noattr.x.attrs == {}

Copy link
Member Author

@benbovy benbovy Aug 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm that's weird, the example in your last comment fails both in main and this PR for me.

Copy link
Collaborator

@keewis keewis Aug 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if that's a environment issue, but this is what I've been using to test:

import pytest
import xarray as xr


@pytest.mark.parametrize("join", ["left", "override"])
def test_align_index_var_attrs(join) -> None:
    ds = xr.Dataset(coords={"x": ("x", [1, 2, 3], {"units": "m"})})
    ds_noattr = xr.Dataset(coords={"x": ("x", [1, 2, 3])})
    actual, actual_noattr = xr.align(ds, ds_noattr, join=join)
    assert ds.x.attrs == {"units": "m"}
    assert ds_noattr.x.attrs == {}
    assert actual.x.attrs == {"units": "m"}
    assert actual_noattr.x.attrs == {}

join="left" fails on main because ds.x.attrs is cleared, but join="override" fails with:

________________________ test_align_index_var_attrs[override] ________________________

join = 'override'

    @pytest.mark.parametrize("join", ["left", "override"])
    def test_align_index_var_attrs(join) -> None:
        ds = xr.Dataset(coords={"x": ("x", [1, 2, 3], {"units": "m"})})
        ds_noattr = xr.Dataset(coords={"x": ("x", [1, 2, 3])})
        actual, actual_noattr = xr.align(ds, ds_noattr, join=join)
        assert ds.x.attrs == {"units": "m"}
        assert ds_noattr.x.attrs == {}
        assert actual.x.attrs == {"units": "m"}
>       assert actual_noattr.x.attrs == {}
E       AssertionError: assert {'units': 'm'} == {}
E         Left contains 1 more item:
E         {'units': 'm'}
E         Use -v to get more diff

.../test_issue6852.py:13: AssertionError

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably misunderstanding what align with join="override" is doing, though

Copy link
Member Author

@benbovy benbovy Aug 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes that makes sense: we need to clarify what join="override" means. Does it mean only override the index or does it mean override the index variables (i.e., their metadata) too? Currently it's the latter but I'm not sure this is what we want. Then indeed it is probably worth opening an issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking more about it, I think it is a bug. align shouldn't update any metadata, even with join="override". I opened #6860.

@benbovy benbovy merged commit 4880012 into pydata:main Aug 31, 2022
@benbovy benbovy deleted the fix-align-index-coord-attrs branch December 8, 2022 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Testing DataArray equality using built-in '==' operator leads to mutilated DataArray.attrs dictionary
3 participants