Read Zarr Segment Index Files #8711

fm3 · 2025-06-24T09:01:54Z

Introduces attachment lookup for segment index files
enables reading zarr segment index files
Fixes bug where segment index file positions were wrongly interpreted as being target-mag when really they are mag1
Adds cache clear (including closing hdf5 file handles) for segment index files
Note to reviewer: git didn’t catch that a lot of code was moved (from the old segmentIndexFileService to the new one, and partially to the hdf5 one).

Steps to test:

For a dataset with segment index file (try both hdf5 and zarr, talk to me for sample data), right-click a segment, context menu should show plausible segment volume + bbox (not zero)
Layer reload in the sidebar should log that cached segment index file handles were cleared.

TODOs:

attachment lookup
make sure hdf5 still works
implement for zarr
clear caches

Issues:

contributes to Attachments in datasource-properties.json #8567
contributes to Support reading new S3 Artifacts #8618

Added changelog entry (create a $PR_NUMBER.md file in unreleased_changes or use ./tools/create-changelog-entry.py)
Removed dev-only changes like prints and application.conf edits
Considered common edge cases
Needs datastore update after deployment

…AgglomerateId

…hments

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/Hdf5SegmentIndexFileService.scala (1)
40-63: Consider improving the error message for unsupported data types.

The implementation is correct, but the error message could be more informative about which data types are supported.

Consider this improvement:
-          .fromBool(bucketEntriesDtype == "uint16") ?~> "value for dtype_bucket_entries in segment index file is not supported, only uint16 is supported"
+          .fromBool(bucketEntriesDtype == "uint16") ?~> s"Unsupported dtype_bucket_entries '$bucketEntriesDtype' in segment index file. Only 'uint16' is supported"

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 96b7158 and 78fefbc.

📒 Files selected for processing (17)

unreleased_changes/8711.md (1 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala (2 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (11 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/SegmentStatistics.scala (1 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala (1 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AgglomerateService.scala (3 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ArrayArtifactHashing.scala (1 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/Hdf5AgglomerateService.scala (1 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/SegmentIndexFileService.scala (0 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/Hdf5MeshFileService.scala (6 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala (4 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/ZarrMeshFileService.scala (2 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/Hdf5SegmentIndexFileService.scala (1 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/SegmentIndexFileService.scala (1 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/ZarrSegmentIndexFileService.scala (1 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/Hdf5FileCache.scala (4 hunks)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TSRemoteDatastoreClient.scala (0 hunks)

💤 Files with no reviewable changes (2)

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TSRemoteDatastoreClient.scala
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/SegmentIndexFileService.scala

🧰 Additional context used

🧠 Learnings (10)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/SegmentStatistics.scala (6)

Learnt from: dieknolle3333
PR: scalableminds/webknossos#8168
File: frontend/javascripts/oxalis/model/sagas/proofread_saga.ts:1039-1039
Timestamp: 2024-11-22T17:18:04.217Z
Learning: In `frontend/javascripts/oxalis/model/sagas/proofread_saga.ts`, when calling `getMagInfo`, the use of `volumeTracingLayer.resolutions` is intentional and should not be changed to `volumeTracingLayer.mags`.

Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.

Learnt from: dieknolle3333
PR: scalableminds/webknossos#8168
File: frontend/javascripts/oxalis/model/sagas/volumetracing_saga.tsx:433-434
Timestamp: 2024-11-22T17:19:07.947Z
Learning: In the codebase, certain usages of `segmentationLayer.resolutions` are intentionally retained and should not be changed to `segmentationLayer.mags` during refactoring.

Learnt from: MichaelBuessemeyer
PR: scalableminds/webknossos#8352
File: app/models/organization/CreditTransactionService.scala:0-0
Timestamp: 2025-01-27T12:06:42.865Z
Learning: In Scala's for-comprehension with Fox (Future-like type), the `<-` operator ensures sequential execution. If any step fails, the entire chain short-circuits and returns early, preventing subsequent operations from executing. This makes it safe to perform validation checks before database operations.

Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/NeuroglancerPrecomputedMeshFileService.scala:161-166
Timestamp: 2025-04-28T14:18:04.368Z
Learning: In Scala for-comprehensions with the Fox error handling monad, `Fox.fromBool()` expressions should use the `<-` binding operator instead of the `=` assignment operator to properly propagate error conditions. Using `=` will cause validation failures to be silently ignored.

Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala:170-173
Timestamp: 2025-04-23T08:51:57.756Z
Learning: In the webknossos codebase, classes extending `FoxImplicits` have access to an implicit conversion from `Option[A]` to `Fox[A]`, where `None` is converted to an empty Fox that fails gracefully in for-comprehensions.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/Hdf5MeshFileService.scala (2)

Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/NeuroglancerPrecomputedMeshService.scala:55-66
Timestamp: 2025-04-23T09:22:26.829Z
Learning: In the NeuroglancerMesh class, shardingSpecification is defined as a concrete ShardingSpecification value, not an Option. It uses meshInfo.sharding.getOrElse(ShardingSpecification.empty) to provide a default empty specification if none is present, ensuring that mesh.shardingSpecification is never null.

Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/NeuroglancerPrecomputedMeshService.scala:55-66
Timestamp: 2025-04-23T09:22:26.829Z
Learning: In the NeuroglancerMesh class, shardingSpecification is defined as a concrete ShardingSpecification value, not an Option. It uses meshInfo.sharding.getOrElse(ShardingSpecification.empty) to provide a default empty specification if none is present, ensuring that mesh.shardingSpecification is never null.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AgglomerateService.scala (1)

Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/ZarrMeshFileService.scala (1)

Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala (1)

Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/Hdf5AgglomerateService.scala (1)

Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala (4)

Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.

Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/NeuroglancerPrecomputedMeshService.scala:55-66
Timestamp: 2025-04-23T09:22:26.829Z
Learning: In the NeuroglancerMesh class, shardingSpecification is defined as a concrete ShardingSpecification value, not an Option. It uses meshInfo.sharding.getOrElse(ShardingSpecification.empty) to provide a default empty specification if none is present, ensuring that mesh.shardingSpecification is never null.

Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/NeuroglancerPrecomputedMeshService.scala:55-66
Timestamp: 2025-04-23T09:22:26.829Z
Learning: In the NeuroglancerMesh class, shardingSpecification is defined as a concrete ShardingSpecification value, not an Option. It uses meshInfo.sharding.getOrElse(ShardingSpecification.empty) to provide a default empty specification if none is present, ensuring that mesh.shardingSpecification is never null.

Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/NeuroglancerMeshHelper.scala:143-152
Timestamp: 2025-04-30T08:54:11.252Z
Learning: In Neuroglancer's sharded format, mesh fragment data is located immediately before the manifest file in the same shard data file. The correct way to calculate the mesh data starting offset is to subtract the total mesh size from the manifest starting offset (neuroglancerOffsetStart - totalMeshSize), not by adding the manifest size.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/Hdf5FileCache.scala (1)

Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/SegmentIndexFileService.scala (1)

Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (4)

Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.

Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: conf/evolutions/133-datasource-properties-in-db.sql:8-16
Timestamp: 2025-05-12T14:15:05.259Z
Learning: The database schema in WEBKNOSSOS has separate tables for dataset layers (`dataset_layers`) and magnifications (`dataset_mags`). The `dataFormat` field is stored in the layers table while magnification-specific fields like `cubeLength` (specific to WKW format) are stored in the mags table.

Learnt from: dieknolle3333
PR: scalableminds/webknossos#8168
File: frontend/javascripts/oxalis/model/sagas/volumetracing_saga.tsx:433-434
Timestamp: 2024-11-22T17:19:07.947Z
Learning: In the codebase, certain usages of `segmentationLayer.resolutions` are intentionally retained and should not be changed to `segmentationLayer.mags` during refactoring.

Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/NeuroglancerMeshHelper.scala:143-152
Timestamp: 2025-04-30T08:54:11.252Z
Learning: In Neuroglancer's sharded format, mesh fragment data is located immediately before the manifest file in the same shard data file. The correct way to calculate the mesh data starting offset is to subtract the total mesh size from the manifest starting offset (neuroglancerOffsetStart - totalMeshSize), not by adding the manifest size.

🧬 Code Graph Analysis (2)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AgglomerateService.scala (2)

util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (1)

listFiles (101-102)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/VaultPath.scala (1)

toUri (79-82)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/Hdf5AgglomerateService.scala (1)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala (1)

localPath (43-49)

🪛 LanguageTool

unreleased_changes/8711.md

[duplication] ~4-~4: Possible typo: you repeated a word.
Context: ...t, and from remote object storage. ### Fixed - Fixed a bug where segment index files for seg...

(ENGLISH_WORD_REPEAT_RULE)

⏰ Context from checks skipped due to timeout of 90000ms (3)

GitHub Check: frontend-tests
GitHub Check: build-smoketest-push
GitHub Check: backend-tests

🔇 Additional comments (29)

unreleased_changes/8711.md (1)

1-6: LGTM! Changelog entry is clear and comprehensive.

The changelog accurately describes the new Zarr3 format support and the mag1 bug fix. The static analysis hint about word repetition is a false positive - this follows standard changelog format where section headers naturally repeat words used in descriptions.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/SegmentStatistics.scala (1)

38-41: LGTM! Variable renaming improves clarity.

The renaming from bucketPositionsInMag to bucketPositionsInRequestedMag makes the code more explicit about which magnification the bucket positions represent. This change aligns well with the PR's goal of fixing mag1 interpretation issues.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala (2)

14-18: LGTM! Proper imports for new segment index services.

The imports correctly bring in the three new segment index file service classes that support the new architecture.

46-48: LGTM! Service bindings follow established patterns.

The three segment index file services are properly bound as eager singletons, consistent with other datastore services like mesh file services. This ensures they're available when needed for segment index operations.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/Hdf5MeshFileService.scala (2)

19-19: LGTM! Cache variable renamed for better clarity.

The renaming from meshFileCache to fileHandleCache makes the variable name more specific about what it caches and aligns with naming conventions in other services.

22-156: LGTM! All cache references updated consistently.

All references to the renamed cache variable have been updated consistently throughout the file, ensuring no broken references.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ArrayArtifactHashing.scala (1)

7-16: LGTM! Trait renaming improves format-agnostic design.

The renaming from Hdf5HashedArrayUtils to ArrayArtifactHashing makes the trait more generic and suitable for use with both HDF5 and Zarr services. The simplified interface focusing only on hash function provision is clean and extensible.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AgglomerateService.scala (1)

36-36: LGTM: Consistent directory naming convention.

The renaming from agglomerateDir to localAgglomeratesDir improves clarity and aligns with the naming conventions used in other services like MeshFileService. The change is applied consistently throughout the file.

Also applies to: 49-49, 101-101

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/ZarrMeshFileService.scala (1)

11-11: LGTM: Trait refactoring improves naming clarity.

The replacement of Hdf5HashedArrayUtils with ArrayArtifactHashing is a sensible rename since the hashing functionality isn't specific to HDF5 files. The getHashFunction method usage remains unchanged, indicating the refactoring maintains backward compatibility.

Also applies to: 28-28

webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala (2)

41-41: LGTM: Added credential support for remote attachments.

The addition of credentialId parameter with a default value maintains backward compatibility while enabling support for remote attachments that require authentication.

42-50: Good centralization of path validation logic.

The localPath method effectively centralizes URI-to-Path conversion and validation that was previously scattered across services. The warning comment correctly indicates that this method throws exceptions and should be used within tryo blocks for safe error handling.

The validation logic ensures only local file system paths are supported for HDF5 files, which is a reasonable constraint given the current implementation.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/Hdf5AgglomerateService.scala (1)

46-47: LGTM: Simplified file opening using centralized validation.

The openHdf5 method is now cleaner by leveraging the localPath method from LayerAttachment, which centralizes the URI scheme validation. This eliminates code duplication while maintaining the same validation behavior.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala (2)

14-14: LGTM: Consistent trait refactoring.

The replacement of Hdf5HashedArrayUtils with ArrayArtifactHashing aligns with the same refactoring applied across other services, improving naming clarity since the hashing functionality isn't HDF5-specific.

Also applies to: 73-73

76-77: LGTM: Improved maintainability with named constants.

The introduction of localMeshesDir and hdf5MeshFileExtension constants eliminates magic strings and improves maintainability. This follows the same pattern established in AgglomerateService and ensures consistency across the codebase.

Also applies to: 103-103, 123-123
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/Hdf5FileCache.scala (3)

13-17: LGTM! Good error handling improvements.

The addition of Box utilities and the trait rename to ArrayArtifactHashing are consistent with the codebase-wide refactoring for unified artifact hashing.

Also applies to: 26-26

103-107: Good error handling pattern!

The implementation safely extracts the local path and maintains consistent Box-based error handling.

62-65: Fix return type inconsistency.

The method signature indicates it returns Box[CachedHdf5File], but the implementation returns just CachedHdf5File from the yield expression. The existing getCachedHdf5File(Path) method returns CachedHdf5File directly, not wrapped in a Box.

Apply this fix:
-  def getCachedHdf5File(attachment: LayerAttachment)(loadFn: Path => CachedHdf5File): Box[CachedHdf5File] =
-    for {
-      localPath <- tryo(attachment.localPath)
-    } yield getCachedHdf5File(localPath)(loadFn)
+  def getCachedHdf5File(attachment: LayerAttachment)(loadFn: Path => CachedHdf5File): Box[CachedHdf5File] =
+    tryo(attachment.localPath).map(localPath => getCachedHdf5File(localPath)(loadFn))
Likely an incorrect or invalid review comment.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/Hdf5SegmentIndexFileService.scala (2)

20-38: Well-structured segment index reading implementation!

The method correctly handles bucket lookup and empty bucket cases. The use of runIf for conditional execution is appropriate.

65-69: LGTM! Efficient cache clearing implementation.

The method correctly constructs paths and uses prefix matching to clear relevant cache entries.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/ZarrSegmentIndexFileService.scala (3)

19-54: Well-structured attributes handling!

The JSON parsing correctly handles the nested structure and the lazy hash function initialization is appropriate.

84-95: Consistent implementation with HDF5 service!

The method correctly handles Zarr array reading and maintains consistency with the HDF5 implementation pattern.

147-156: Comprehensive cache clearing!

The method correctly clears both caches and properly filters by data source and layer.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/SegmentIndexFileService.scala (4)

56-79: Well-designed attachment lookup strategy!

The method correctly prioritizes registered attachments and handles URI normalization appropriately.

101-109: Clean format delegation pattern!

The pattern matching on data format provides clear delegation to the appropriate service implementation.

111-116: Correct coordinate transformation logic!

The method properly handles the conversion from mag1 bucket positions to target magnification bucket indices.

222-233: Comprehensive cache clearing across all services!

The method correctly clears the local cache and delegates to both format-specific services.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (3)

48-49: Good refactoring - direct service injection!

Direct injection of segmentIndexFileService and agglomerateService improves code clarity and follows dependency injection best practices.

447-462: Comprehensive cache clearing implementation!

The cache clearing now properly includes segment index file handles, and the logging provides good visibility into what was cleared.

593-736: Excellent migration to key-based segment index access!

All segment index related endpoints have been properly updated to use the new service architecture. The consistent use of segmentIndexFileKey throughout provides better abstraction and supports multiple file formats.

...m/scalableminds/webknossos/datastore/services/segmentindex/ZarrSegmentIndexFileService.scala

MichaelBuessemeyer

Here is my code review, will proceed with testing.

Some comments didn't show up in my IDE after creating them during the review. But I thin all of them are present. At least one of these missing comments appears on the GH review view

...e/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala

...s-datastore/app/com/scalableminds/webknossos/datastore/services/Hdf5AgglomerateService.scala

MichaelBuessemeyer · 2025-07-01T11:15:04Z

...p/com/scalableminds/webknossos/datastore/services/segmentindex/SegmentIndexFileService.scala

+      .map(_ / Vec3Int.full(DataLayer.bucketLength)) // map positions to cube indices
+      .distinct
+
+  def getSegmentVolume(dataSourceId: DataSourceId,


The dataSourceId shouldn't be needed here as it is already a member of segmentIndexFileKey. So I'd remove it from the param list and use segmentIndexFileKey.dataSourceId instead.

Same for the other functions below

I’m not sure about this one. It’s true that the segmentIndexFileKey contains the correct dataSourceId, but since the segmentIndexFileKey is designed to be used to look up segment index files, it feels a little odd to repurpose it to also carry the dataSourceId that is later used to load voxel data. My feeling is that the additional parameter makes the intent of this function clearer, even if it is somewhat redundant. What do you think?

...p/com/scalableminds/webknossos/datastore/services/segmentindex/SegmentIndexFileService.scala

...m/scalableminds/webknossos/datastore/services/segmentindex/Hdf5SegmentIndexFileService.scala

...m/scalableminds/webknossos/datastore/services/segmentindex/ZarrSegmentIndexFileService.scala

...sos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TSRemoteDatastoreClient.scala

MichaelBuessemeyer · 2025-07-01T12:31:55Z

Oh and thanks for being so fast in making all these attachments s3 / zarr compatible 👯

MichaelBuessemeyer

Awesome, works great 🎉

I think this can be directly merged after the first iteration 🔥

fm3 · 2025-07-02T09:29:51Z

Thank you for the detailed review! Good catch about the tryos. I also did some refactoring on the common string constants, please have a quick look at the last commit :) Fortunately I found out that the starting / is not required when passing array names to the hdf5 readers, so I can reuse the strings between hdf5 and zarr implementations without modifications.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/SegmentIndexFileService.scala (2)

69-69: Add missing error message to messages.conf.

The error message key "segmentIndexFile.notFound" referenced here appears to be missing from the messages.conf file, as noted in previous review comments.

115-115: Consider removing redundant dataSourceId parameter.

As discussed in previous comments, the dataSourceId parameter may be redundant since it's already available in segmentIndexFileKey.dataSourceId. While the author provided justification for keeping it for clarity, consider whether the redundancy is worth the potential confusion.

🧹 Nitpick comments (1)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/SegmentIndexFileService.scala (1)
33-33: Consider adding documentation for the case class.

The SegmentIndexFileKey case class serves as a key data structure but lacks documentation. Adding a brief comment explaining its purpose would improve code maintainability.
+/** Key for identifying and caching segment index files by data source, layer, and attachment. */
 case class SegmentIndexFileKey(dataSourceId: DataSourceId, layerName: String, attachment: LayerAttachment)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 661b258 and 313838c.

📒 Files selected for processing (5)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala (2 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/BinaryDataController.scala (1 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (11 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mapping/AgglomerateService.scala (4 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/SegmentIndexFileService.scala (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (4)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/BinaryDataController.scala
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mapping/AgglomerateService.scala
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala

🧰 Additional context used

🧠 Learnings (1)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/SegmentIndexFileService.scala (1)

Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.

⏰ Context from checks skipped due to timeout of 90000ms (3)

GitHub Check: build-smoketest-push
GitHub Check: backend-tests
GitHub Check: frontend-tests

🔇 Additional comments (4)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/SegmentIndexFileService.scala (4)

108-113: Mathematical operations for coordinate transformation look correct.

The coordinate transformation logic properly:

Converts bucket indices to positions by scaling with bucket length

Scales positions to target magnification

Converts back to bucket indices at target magnification

Removes duplicates with .distinct

This implementation correctly handles the coordinate space transformations needed for multi-resolution data.

153-173: Bucket request construction logic is well-implemented.

The logic correctly:

Scales bucket positions from requested magnification back to mag1 coordinates

Constructs proper DataServiceDataRequest objects with correct cuboid dimensions

Applies agglomerate settings when needed

Uses appropriate data service request settings

The coordinate calculations properly account for the bucket length scaling factor.

204-217: Agglomerate handling logic is robust.

The implementation properly handles both agglomerate and non-agglomerate cases:

Checks if the ID is within the valid agglomerate range using largestAgglomerateId

Gracefully handles IDs outside the file range (likely created during brushing)

Falls back to treating the ID as a segment ID when no agglomerate file is present

This defensive approach prevents errors when dealing with dynamically created agglomerate IDs.

219-230: Cache clearing implementation is comprehensive.

The cache clearing logic:

Properly filters cache entries by dataSourceId and optional layerName

Delegates to both HDF5 and Zarr services for complete cleanup

Returns the total count of cleared entries for monitoring purposes

This ensures all related cached data is properly cleaned up across all format-specific services.

MichaelBuessemeyer

Awesome, thanks for all this refactoring 🔥 🔧

The latest commit lgtm and mini testing showed that it is still working 👍

fm3 added 30 commits May 19, 2025 10:20

WIP: Read zarr agglomerate files

4dd9409

zarr group path

380bd69

test reading from zarr array

7fb643f

axisOrder: make y optional

f987ebf

Merge branch 'master' into agglomerates-zarr

935571a

undo attempt to make axisOrder.y optional

7c1cc8b

read multi array, ignoring underlying storage and axis order

5af14fc

apply agglomerate

f055c1e

offset can be long; pass tokencontext

13ff0e3

WIP read agglomerate skeleton

d853389

fix reading agglomerate skeleton

56ce08b

Change DatasetArray shape from Int to Long. Implement reading largest…

2855d2b

…AgglomerateId

remove unused agglomeratesForAllSegments

4ed5483

Merge branch 'master' into agglomerates-zarr

7f8f662

add shortcut for shape.product==0; implement segmentIdsForAgglomerateId

291aab5

remove unused test

0439bce

implement positionForSegmentId; agglomerateIdsForSegmentIds

11027d7

select mapping by request

0c5d647

shortcut for single-dimension shape+offset

90d97cd

handle uint32 agglomerate_to_segments arrays

8551f99

useZarr=false to test ci

d183c99

change chunkIndices back to list

cd04466

use headOption instead of list deconstruction

9eac53d

Merge branch 'master' into agglomerates-zarr

0105b8e

WIP distinguish btw hdf5 and zarr according to registered layer attac…

fd7a281

…hments

pass datasource id + layer

19641af

list attached agglomerate files

716794d

format

feb8cb4

Merge branch 'master' into agglomerates-zarr

5c924dc

use agglomeratefilekey as cache key for proper cache clear support

f040483

fm3 and others added 5 commits June 26, 2025 09:47

Merge branch 'master' into meshfile-zarr

c3dc703

Merge branch 'meshfile-zarr' into zarr-segment-index

4f313b4

Merge branch 'master' into meshfile-zarr

75fbccf

Merge branch 'master' into meshfile-zarr

b01ce43

Merge branch 'meshfile-zarr' into zarr-segment-index

b36841a

Base automatically changed from meshfile-zarr to master June 30, 2025 07:21

Merge branch 'master' into zarr-segment-index

78fefbc

coderabbitai bot reviewed Jun 30, 2025

View reviewed changes

...m/scalableminds/webknossos/datastore/services/segmentindex/ZarrSegmentIndexFileService.scala Show resolved Hide resolved

fix error message

aed235e

fm3 requested a review from MichaelBuessemeyer June 30, 2025 07:57

Merge branch 'master' into zarr-segment-index

1f12060

MichaelBuessemeyer reviewed Jul 1, 2025

View reviewed changes

fm3 added 2 commits July 2, 2025 10:26

Merge branch 'master' into zarr-segment-index

9e73f73

implement pr feedback; extract common string values to traits

661b258

format

313838c

coderabbitai bot reviewed Jul 2, 2025

View reviewed changes

fm3 requested a review from MichaelBuessemeyer July 2, 2025 13:28

MichaelBuessemeyer approved these changes Jul 3, 2025

View reviewed changes

Merge branch 'master' into zarr-segment-index

2ab3ab2

fm3 enabled auto-merge (squash) July 3, 2025 09:14

fm3 merged commit 4c6b2b2 into master Jul 3, 2025
5 checks passed

fm3 deleted the zarr-segment-index branch July 3, 2025 09:18

coderabbitai bot mentioned this pull request Jul 11, 2025

Release 25.07.1 #8775

Merged

2 tasks

coderabbitai bot mentioned this pull request Jul 22, 2025

Virtual Datasets #8708

Merged

22 tasks

fm3 mentioned this pull request Jun 16, 2025

Attachments in datasource-properties.json #8567

Open

8 tasks

coderabbitai bot mentioned this pull request Aug 5, 2025

Don’t re-scan attachments in their list routes #8829

Merged

2 tasks

coderabbitai bot mentioned this pull request Sep 29, 2025

Use datastore segment index when loading ad-hoc meshes #8922

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Read Zarr Segment Index Files #8711

Read Zarr Segment Index Files #8711

fm3 commented Jun 24, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

MichaelBuessemeyer left a comment

Uh oh!

Uh oh!

Uh oh!

MichaelBuessemeyer Jul 1, 2025

Uh oh!

MichaelBuessemeyer Jul 1, 2025

Uh oh!

fm3 Jul 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MichaelBuessemeyer commented Jul 1, 2025

Uh oh!

MichaelBuessemeyer left a comment

Uh oh!

fm3 commented Jul 2, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

MichaelBuessemeyer left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Read Zarr Segment Index Files #8711

Read Zarr Segment Index Files #8711

Conversation

fm3 commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Steps to test:

TODOs:

Issues:

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

MichaelBuessemeyer Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

MichaelBuessemeyer Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

fm3 Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MichaelBuessemeyer commented Jul 1, 2025

Uh oh!

MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

Uh oh!

fm3 commented Jul 2, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

MichaelBuessemeyer left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

fm3 commented Jun 24, 2025 •

edited

Loading

MichaelBuessemeyer left a comment •

edited

Loading