Skip to content

Fix a bug when setting complete chunks #2851

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Feb 23, 2025

Conversation

dcherian
Copy link
Contributor

@dcherian dcherian commented Feb 19, 2025

Closes #2849

cc @ilan-gold

[Description of PR]

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.rst
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Feb 19, 2025
@@ -320,6 +309,20 @@ def _merge_chunk_array(
for idx in range(chunk_spec.ndim)
)
chunk_value = chunk_value[item]
if is_complete_chunk and chunk_value.shape == chunk_spec.shape:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will fail for scalars, need a test for that too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it works for scalars, because chunk_value is now NDBuffer which does have shape. I added a test.

@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Feb 19, 2025
@d-v-b
Copy link
Contributor

d-v-b commented Feb 20, 2025

could you explain what the bug was, and how this PR fixes it?

@dcherian
Copy link
Contributor Author

g = zarr.open_group("foo.zarr", zarr_format=3, mode="w")
a = g.create_array("bar", shape=(10,), chunks=(3,), dtype=int)
data = np.array([7, 8, 9])
a[slice(7, 10)] = data
np.testing.assert_array_equal(a[slice(7, 10)], data)

I think it's broken for overwriting a complete last chunk that is smaller than the chunk size, AND where your setitem value is a size that is larger than that last-chunk-size. This also suggests that somewhere we can do

chunk[size=2] = array[size=3]

and it will assign array[:2]

@ilan-gold
Copy link
Contributor

Not saying it's related but reminds me of #2469

Thanks for investigating.

1. Emphasize arrays of side > 1,
2. Emphasize indexing the last chunk for both setitem & getitem
@dcherian
Copy link
Contributor Author

dcherian commented Feb 21, 2025

OK I have updated the priorities of our property tests and it caught it immediately. Now we prioritize arrays of dim size >=3, and prioritize indexing the array near its end (so that we test handling the last chunk).

We also don't test complicated attrs, array names, and long array paths in the indexing tests.

I learnt a lesson. :)

@d-v-b
Copy link
Contributor

d-v-b commented Feb 21, 2025

nice!

@dcherian
Copy link
Contributor Author

From manual inspection, the unit test in 0e0b34f (#2851) fails on ubuntu, windows with numpy 2.1 only.

It succeeds on my macbook.

Is someone able to debug this locally?

@LDeakin
Copy link
Member

LDeakin commented Feb 21, 2025

@dcherian What do you expect to happen in that repeated index test? Isn't it ambiguous which values should be taken from the input if the output indexes overlap?

It would be great for https://github.com/ilan-gold/zarrs-python if zarr-python could require that output indexes are disjoint.

@dcherian
Copy link
Contributor Author

I am surprised it works, but zarr-python "sends the indexes to the chunk" so you get the last of the repeated values, numpy style.

@dcherian
Copy link
Contributor Author

You know, it looks like the failing test is doing the opposite, and we get the first value:

array.oindex[([-1, -1, 0, 0],)] = [0, 1, 2, 3]
  ACTUAL: array([2., 0., 0., 0.])
  DESIRED: array([3, 0, 0, 1])

@dcherian
Copy link
Contributor Author

dcherian commented Feb 21, 2025

I did not succeed in reproducing on a linux machine either so opened #2854 to revert. I've migrated the testing improvements there.

Reproduced and reported in #2855 . I am xfailing here.

@dcherian dcherian marked this pull request as draft February 21, 2025 21:39
@dcherian dcherian marked this pull request as ready for review February 21, 2025 22:24
@dcherian dcherian mentioned this pull request Feb 22, 2025
@d-v-b d-v-b enabled auto-merge (squash) February 22, 2025 22:03
@d-v-b d-v-b merged commit 8b59a38 into zarr-developers:main Feb 23, 2025
29 of 30 checks passed
@dcherian dcherian deleted the fix-oindex-set branch March 7, 2025 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Roundtrip fails after resize
4 participants