fix/shard iteration redux #3422

d-v-b · 2025-09-01T18:44:51Z

Completes the work of #3299 by replacing a second invocation of _iter_chunk_coords with _iter_shard_coords.

In a separate PR, we need to refactor this code block:

Lines 4402 to 4428 in b8dbf56

    
           if write_data: 
        
               if isinstance(data, Array): 
        
                   async def _copy_array_region( 
        
                       chunk_coords: tuple[int, ...] | slice, _data: Array 
        
                   ) -> None: 
        
                       arr = await _data._async_array.getitem(chunk_coords) 
        
                       await result.setitem(chunk_coords, arr) 
        
                   # Stream data from the source array to the new array 
        
                   await concurrent_map( 
        
                       [(region, data) for region in result._iter_shard_regions()], 
        
                       _copy_array_region, 
        
                       zarr.core.config.config.get("async.concurrency"), 
        
                   ) 
        
               else: 
        
                   async def _copy_arraylike_region(chunk_coords: slice, _data: NDArrayLike) -> None: 
        
                       await result.setitem(chunk_coords, _data[chunk_coords]) 
        
                   # Stream data from the source array to the new array 
        
                   await concurrent_map( 
        
                       [(region, data) for region in result._iter_chunk_regions()], 
        
                       _copy_arraylike_region, 
        
                       zarr.core.config.config.get("async.concurrency"), 
        
                   ) 
        
           return result

. It's massive code smell to have special "write one array to another" logic defined in the tail end of an array creation function.

I add a test that checks how many get requests we make when calling create_array(data=..). In main, it's 1 get per chunk (bad). In this PR, it's 1 get per shard (better). But we can also get to 0 gets per shard by introducing some special logic for full shard writes. expect this in a later PR.

edit: closes #3169 and #3421

codecov · 2025-09-01T20:37:30Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.07%. Comparing base (ee9c182) to head (43320dc).
⚠️ Report is 1 commits behind head on main.

❗ There is a different number of reports uploaded between BASE (ee9c182) and HEAD (43320dc). Click for more details.

HEAD has 4 uploads less than BASE

Flag BASE (ee9c182) HEAD (43320dc)

14 10

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #3422       +/-   ##
===========================================
- Coverage   94.92%   61.07%   -33.86%     
===========================================
  Files          79       79               
  Lines        9500     9500               
===========================================
- Hits         9018     5802     -3216     
- Misses        482     3698     +3216

Files with missing lines	Coverage Δ
src/zarr/core/array.py	`68.64% <ø> (-28.81%)`	⬇️

... and 66 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

maxrjones

Note: we don't actually need one get per shard, but this is the current behavior

Could you open an issue to track this?

d-v-b · 2025-09-12T14:23:22Z

Note: we don't actually need one get per shard, but this is the current behavior

Could you open an issue to track this?

#3421 tracks this

d-v-b added 2 commits September 1, 2025 18:21

iterate over shards instead of chunks in second branch

6abd5b5

add test

85d14d1

github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Sep 1, 2025

d-v-b added 2 commits September 1, 2025 20:47

parametrize over array type

39fb5e0

changelog

9924f90

github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Sep 1, 2025

d-v-b requested a review from a team September 1, 2025 18:51

d-v-b added 2 commits September 1, 2025 21:06

appease mypy

edb1492

Merge branch 'main' into fix/shard-iteration-redux

c859429

d-v-b added 3 commits September 2, 2025 16:44

Merge branch 'main' into fix/shard-iteration-redux

9d5ffb3

Merge branch 'main' into fix/shard-iteration-redux

f4b5888

Merge branch 'main' into fix/shard-iteration-redux

f2c4fac

maxrjones approved these changes Sep 5, 2025

View reviewed changes

Merge branch 'main' into fix/shard-iteration-redux

43320dc

d-v-b enabled auto-merge (squash) September 12, 2025 14:23

d-v-b merged commit 27d689c into zarr-developers:main Sep 12, 2025
30 of 31 checks passed

d-v-b mentioned this pull request Sep 22, 2025

create_array() with shards and data parameter only partially writes data #3479

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix/shard iteration redux #3422

fix/shard iteration redux #3422

Uh oh!

d-v-b commented Sep 1, 2025 •

edited

Loading

Uh oh!

codecov bot commented Sep 1, 2025 •

edited

Loading

Uh oh!

maxrjones left a comment •

edited

Loading

Uh oh!

d-v-b commented Sep 12, 2025

Uh oh!

Uh oh!

Uh oh!

	if write_data:
	if isinstance(data, Array):

	async def _copy_array_region(
	chunk_coords: tuple[int, ...] \| slice, _data: Array
	) -> None:
	arr = await _data._async_array.getitem(chunk_coords)
	await result.setitem(chunk_coords, arr)

	# Stream data from the source array to the new array
	await concurrent_map(
	[(region, data) for region in result._iter_shard_regions()],
	_copy_array_region,
	zarr.core.config.config.get("async.concurrency"),
	)
	else:

	async def _copy_arraylike_region(chunk_coords: slice, _data: NDArrayLike) -> None:
	await result.setitem(chunk_coords, _data[chunk_coords])

	# Stream data from the source array to the new array
	await concurrent_map(
	[(region, data) for region in result._iter_chunk_regions()],
	_copy_arraylike_region,
	zarr.core.config.config.get("async.concurrency"),
	)
	return result

Uh oh!

fix/shard iteration redux #3422

fix/shard iteration redux #3422

Uh oh!

Conversation

d-v-b commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

maxrjones left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

d-v-b commented Sep 12, 2025

Uh oh!

Uh oh!

Uh oh!

d-v-b commented Sep 1, 2025 •

edited

Loading

codecov bot commented Sep 1, 2025 •

edited

Loading

maxrjones left a comment •

edited

Loading