Skip to content

Conversation

d-v-b
Copy link
Contributor

@d-v-b d-v-b commented Sep 1, 2025

Completes the work of #3299 by replacing a second invocation of _iter_chunk_coords with _iter_shard_coords.

In a separate PR, we need to refactor this code block:

if write_data:
if isinstance(data, Array):
async def _copy_array_region(
chunk_coords: tuple[int, ...] | slice, _data: Array
) -> None:
arr = await _data._async_array.getitem(chunk_coords)
await result.setitem(chunk_coords, arr)
# Stream data from the source array to the new array
await concurrent_map(
[(region, data) for region in result._iter_shard_regions()],
_copy_array_region,
zarr.core.config.config.get("async.concurrency"),
)
else:
async def _copy_arraylike_region(chunk_coords: slice, _data: NDArrayLike) -> None:
await result.setitem(chunk_coords, _data[chunk_coords])
# Stream data from the source array to the new array
await concurrent_map(
[(region, data) for region in result._iter_chunk_regions()],
_copy_arraylike_region,
zarr.core.config.config.get("async.concurrency"),
)
return result
. It's massive code smell to have special "write one array to another" logic defined in the tail end of an array creation function.

I add a test that checks how many get requests we make when calling create_array(data=..). In main, it's 1 get per chunk (bad). In this PR, it's 1 get per shard (better). But we can also get to 0 gets per shard by introducing some special logic for full shard writes. expect this in a later PR.

edit: closes #3169 and #3421

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Sep 1, 2025
@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Sep 1, 2025
@d-v-b d-v-b requested a review from a team September 1, 2025 18:51
Copy link

codecov bot commented Sep 1, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.07%. Comparing base (ee9c182) to head (43320dc).
⚠️ Report is 1 commits behind head on main.

❗ There is a different number of reports uploaded between BASE (ee9c182) and HEAD (43320dc). Click for more details.

HEAD has 4 uploads less than BASE
Flag BASE (ee9c182) HEAD (43320dc)
14 10
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #3422       +/-   ##
===========================================
- Coverage   94.92%   61.07%   -33.86%     
===========================================
  Files          79       79               
  Lines        9500     9500               
===========================================
- Hits         9018     5802     -3216     
- Misses        482     3698     +3216     
Files with missing lines Coverage Δ
src/zarr/core/array.py 68.64% <ø> (-28.81%) ⬇️

... and 66 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@maxrjones maxrjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: we don't actually need one get per shard, but this is the current behavior

Could you open an issue to track this?

@d-v-b
Copy link
Contributor Author

d-v-b commented Sep 12, 2025

Note: we don't actually need one get per shard, but this is the current behavior

Could you open an issue to track this?

#3421 tracks this

@d-v-b d-v-b enabled auto-merge (squash) September 12, 2025 14:23
@d-v-b d-v-b merged commit 27d689c into zarr-developers:main Sep 12, 2025
30 of 31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"Stored and computed checksum do not match" in case when shards large than chunks
2 participants