Skip to content

Conversation

fm3
Copy link
Member

@fm3 fm3 commented Jun 24, 2025

  • Introduces attachment lookup for segment index files
  • enables reading zarr segment index files
  • Fixes bug where segment index file positions were wrongly interpreted as being target-mag when really they are mag1
  • Adds cache clear (including closing hdf5 file handles) for segment index files
  • Note to reviewer: git didn’t catch that a lot of code was moved (from the old segmentIndexFileService to the new one, and partially to the hdf5 one).

Steps to test:

  • For a dataset with segment index file (try both hdf5 and zarr, talk to me for sample data), right-click a segment, context menu should show plausible segment volume + bbox (not zero)
  • Layer reload in the sidebar should log that cached segment index file handles were cleared.

TODOs:

  • attachment lookup
  • make sure hdf5 still works
  • implement for zarr
  • clear caches

Issues:


  • Added changelog entry (create a $PR_NUMBER.md file in unreleased_changes or use ./tools/create-changelog-entry.py)
  • Removed dev-only changes like prints and application.conf edits
  • Considered common edge cases
  • Needs datastore update after deployment

fm3 added 30 commits May 19, 2025 10:20
Base automatically changed from meshfile-zarr to master June 30, 2025 07:21
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/Hdf5SegmentIndexFileService.scala (1)

40-63: Consider improving the error message for unsupported data types.

The implementation is correct, but the error message could be more informative about which data types are supported.

Consider this improvement:

-          .fromBool(bucketEntriesDtype == "uint16") ?~> "value for dtype_bucket_entries in segment index file is not supported, only uint16 is supported"
+          .fromBool(bucketEntriesDtype == "uint16") ?~> s"Unsupported dtype_bucket_entries '$bucketEntriesDtype' in segment index file. Only 'uint16' is supported"
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 96b7158 and 78fefbc.

📒 Files selected for processing (17)
  • unreleased_changes/8711.md (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (11 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/SegmentStatistics.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AgglomerateService.scala (3 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ArrayArtifactHashing.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/Hdf5AgglomerateService.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/SegmentIndexFileService.scala (0 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/Hdf5MeshFileService.scala (6 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala (4 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/ZarrMeshFileService.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/Hdf5SegmentIndexFileService.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/SegmentIndexFileService.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/ZarrSegmentIndexFileService.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/Hdf5FileCache.scala (4 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TSRemoteDatastoreClient.scala (0 hunks)
💤 Files with no reviewable changes (2)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TSRemoteDatastoreClient.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/SegmentIndexFileService.scala
🧰 Additional context used
🧠 Learnings (10)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/SegmentStatistics.scala (6)
Learnt from: dieknolle3333
PR: scalableminds/webknossos#8168
File: frontend/javascripts/oxalis/model/sagas/proofread_saga.ts:1039-1039
Timestamp: 2024-11-22T17:18:04.217Z
Learning: In `frontend/javascripts/oxalis/model/sagas/proofread_saga.ts`, when calling `getMagInfo`, the use of `volumeTracingLayer.resolutions` is intentional and should not be changed to `volumeTracingLayer.mags`.
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.
Learnt from: dieknolle3333
PR: scalableminds/webknossos#8168
File: frontend/javascripts/oxalis/model/sagas/volumetracing_saga.tsx:433-434
Timestamp: 2024-11-22T17:19:07.947Z
Learning: In the codebase, certain usages of `segmentationLayer.resolutions` are intentionally retained and should not be changed to `segmentationLayer.mags` during refactoring.
Learnt from: MichaelBuessemeyer
PR: scalableminds/webknossos#8352
File: app/models/organization/CreditTransactionService.scala:0-0
Timestamp: 2025-01-27T12:06:42.865Z
Learning: In Scala's for-comprehension with Fox (Future-like type), the `<-` operator ensures sequential execution. If any step fails, the entire chain short-circuits and returns early, preventing subsequent operations from executing. This makes it safe to perform validation checks before database operations.
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/NeuroglancerPrecomputedMeshFileService.scala:161-166
Timestamp: 2025-04-28T14:18:04.368Z
Learning: In Scala for-comprehensions with the Fox error handling monad, `Fox.fromBool()` expressions should use the `<-` binding operator instead of the `=` assignment operator to properly propagate error conditions. Using `=` will cause validation failures to be silently ignored.
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala:170-173
Timestamp: 2025-04-23T08:51:57.756Z
Learning: In the webknossos codebase, classes extending `FoxImplicits` have access to an implicit conversion from `Option[A]` to `Fox[A]`, where `None` is converted to an empty Fox that fails gracefully in for-comprehensions.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/Hdf5MeshFileService.scala (2)
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/NeuroglancerPrecomputedMeshService.scala:55-66
Timestamp: 2025-04-23T09:22:26.829Z
Learning: In the NeuroglancerMesh class, shardingSpecification is defined as a concrete ShardingSpecification value, not an Option. It uses meshInfo.sharding.getOrElse(ShardingSpecification.empty) to provide a default empty specification if none is present, ensuring that mesh.shardingSpecification is never null.
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/NeuroglancerPrecomputedMeshService.scala:55-66
Timestamp: 2025-04-23T09:22:26.829Z
Learning: In the NeuroglancerMesh class, shardingSpecification is defined as a concrete ShardingSpecification value, not an Option. It uses meshInfo.sharding.getOrElse(ShardingSpecification.empty) to provide a default empty specification if none is present, ensuring that mesh.shardingSpecification is never null.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AgglomerateService.scala (1)
Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/ZarrMeshFileService.scala (1)
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala (1)
Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/Hdf5AgglomerateService.scala (1)
Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala (4)
Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/NeuroglancerPrecomputedMeshService.scala:55-66
Timestamp: 2025-04-23T09:22:26.829Z
Learning: In the NeuroglancerMesh class, shardingSpecification is defined as a concrete ShardingSpecification value, not an Option. It uses meshInfo.sharding.getOrElse(ShardingSpecification.empty) to provide a default empty specification if none is present, ensuring that mesh.shardingSpecification is never null.
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/NeuroglancerPrecomputedMeshService.scala:55-66
Timestamp: 2025-04-23T09:22:26.829Z
Learning: In the NeuroglancerMesh class, shardingSpecification is defined as a concrete ShardingSpecification value, not an Option. It uses meshInfo.sharding.getOrElse(ShardingSpecification.empty) to provide a default empty specification if none is present, ensuring that mesh.shardingSpecification is never null.
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/NeuroglancerMeshHelper.scala:143-152
Timestamp: 2025-04-30T08:54:11.252Z
Learning: In Neuroglancer's sharded format, mesh fragment data is located immediately before the manifest file in the same shard data file. The correct way to calculate the mesh data starting offset is to subtract the total mesh size from the manifest starting offset (neuroglancerOffsetStart - totalMeshSize), not by adding the manifest size.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/Hdf5FileCache.scala (1)
Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/SegmentIndexFileService.scala (1)
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (4)
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: conf/evolutions/133-datasource-properties-in-db.sql:8-16
Timestamp: 2025-05-12T14:15:05.259Z
Learning: The database schema in WEBKNOSSOS has separate tables for dataset layers (`dataset_layers`) and magnifications (`dataset_mags`). The `dataFormat` field is stored in the layers table while magnification-specific fields like `cubeLength` (specific to WKW format) are stored in the mags table.
Learnt from: dieknolle3333
PR: scalableminds/webknossos#8168
File: frontend/javascripts/oxalis/model/sagas/volumetracing_saga.tsx:433-434
Timestamp: 2024-11-22T17:19:07.947Z
Learning: In the codebase, certain usages of `segmentationLayer.resolutions` are intentionally retained and should not be changed to `segmentationLayer.mags` during refactoring.
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/NeuroglancerMeshHelper.scala:143-152
Timestamp: 2025-04-30T08:54:11.252Z
Learning: In Neuroglancer's sharded format, mesh fragment data is located immediately before the manifest file in the same shard data file. The correct way to calculate the mesh data starting offset is to subtract the total mesh size from the manifest starting offset (neuroglancerOffsetStart - totalMeshSize), not by adding the manifest size.
🧬 Code Graph Analysis (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AgglomerateService.scala (2)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (1)
  • listFiles (101-102)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/VaultPath.scala (1)
  • toUri (79-82)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/Hdf5AgglomerateService.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala (1)
  • localPath (43-49)
🪛 LanguageTool
unreleased_changes/8711.md

[duplication] ~4-~4: Possible typo: you repeated a word.
Context: ...t, and from remote object storage. ### Fixed - Fixed a bug where segment index files for seg...

(ENGLISH_WORD_REPEAT_RULE)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: frontend-tests
  • GitHub Check: build-smoketest-push
  • GitHub Check: backend-tests
🔇 Additional comments (29)
unreleased_changes/8711.md (1)

1-6: LGTM! Changelog entry is clear and comprehensive.

The changelog accurately describes the new Zarr3 format support and the mag1 bug fix. The static analysis hint about word repetition is a false positive - this follows standard changelog format where section headers naturally repeat words used in descriptions.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/SegmentStatistics.scala (1)

38-41: LGTM! Variable renaming improves clarity.

The renaming from bucketPositionsInMag to bucketPositionsInRequestedMag makes the code more explicit about which magnification the bucket positions represent. This change aligns well with the PR's goal of fixing mag1 interpretation issues.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala (2)

14-18: LGTM! Proper imports for new segment index services.

The imports correctly bring in the three new segment index file service classes that support the new architecture.


46-48: LGTM! Service bindings follow established patterns.

The three segment index file services are properly bound as eager singletons, consistent with other datastore services like mesh file services. This ensures they're available when needed for segment index operations.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/Hdf5MeshFileService.scala (2)

19-19: LGTM! Cache variable renamed for better clarity.

The renaming from meshFileCache to fileHandleCache makes the variable name more specific about what it caches and aligns with naming conventions in other services.


22-156: LGTM! All cache references updated consistently.

All references to the renamed cache variable have been updated consistently throughout the file, ensuring no broken references.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ArrayArtifactHashing.scala (1)

7-16: LGTM! Trait renaming improves format-agnostic design.

The renaming from Hdf5HashedArrayUtils to ArrayArtifactHashing makes the trait more generic and suitable for use with both HDF5 and Zarr services. The simplified interface focusing only on hash function provision is clean and extensible.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AgglomerateService.scala (1)

36-36: LGTM: Consistent directory naming convention.

The renaming from agglomerateDir to localAgglomeratesDir improves clarity and aligns with the naming conventions used in other services like MeshFileService. The change is applied consistently throughout the file.

Also applies to: 49-49, 101-101

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/ZarrMeshFileService.scala (1)

11-11: LGTM: Trait refactoring improves naming clarity.

The replacement of Hdf5HashedArrayUtils with ArrayArtifactHashing is a sensible rename since the hashing functionality isn't specific to HDF5 files. The getHashFunction method usage remains unchanged, indicating the refactoring maintains backward compatibility.

Also applies to: 28-28

webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala (2)

41-41: LGTM: Added credential support for remote attachments.

The addition of credentialId parameter with a default value maintains backward compatibility while enabling support for remote attachments that require authentication.


42-50: Good centralization of path validation logic.

The localPath method effectively centralizes URI-to-Path conversion and validation that was previously scattered across services. The warning comment correctly indicates that this method throws exceptions and should be used within tryo blocks for safe error handling.

The validation logic ensures only local file system paths are supported for HDF5 files, which is a reasonable constraint given the current implementation.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/Hdf5AgglomerateService.scala (1)

46-47: LGTM: Simplified file opening using centralized validation.

The openHdf5 method is now cleaner by leveraging the localPath method from LayerAttachment, which centralizes the URI scheme validation. This eliminates code duplication while maintaining the same validation behavior.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala (2)

14-14: LGTM: Consistent trait refactoring.

The replacement of Hdf5HashedArrayUtils with ArrayArtifactHashing aligns with the same refactoring applied across other services, improving naming clarity since the hashing functionality isn't HDF5-specific.

Also applies to: 73-73


76-77: LGTM: Improved maintainability with named constants.

The introduction of localMeshesDir and hdf5MeshFileExtension constants eliminates magic strings and improves maintainability. This follows the same pattern established in AgglomerateService and ensures consistency across the codebase.

Also applies to: 103-103, 123-123

webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/Hdf5FileCache.scala (3)

13-17: LGTM! Good error handling improvements.

The addition of Box utilities and the trait rename to ArrayArtifactHashing are consistent with the codebase-wide refactoring for unified artifact hashing.

Also applies to: 26-26


103-107: Good error handling pattern!

The implementation safely extracts the local path and maintains consistent Box-based error handling.


62-65: Fix return type inconsistency.

The method signature indicates it returns Box[CachedHdf5File], but the implementation returns just CachedHdf5File from the yield expression. The existing getCachedHdf5File(Path) method returns CachedHdf5File directly, not wrapped in a Box.

Apply this fix:

-  def getCachedHdf5File(attachment: LayerAttachment)(loadFn: Path => CachedHdf5File): Box[CachedHdf5File] =
-    for {
-      localPath <- tryo(attachment.localPath)
-    } yield getCachedHdf5File(localPath)(loadFn)
+  def getCachedHdf5File(attachment: LayerAttachment)(loadFn: Path => CachedHdf5File): Box[CachedHdf5File] =
+    tryo(attachment.localPath).map(localPath => getCachedHdf5File(localPath)(loadFn))

Likely an incorrect or invalid review comment.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/Hdf5SegmentIndexFileService.scala (2)

20-38: Well-structured segment index reading implementation!

The method correctly handles bucket lookup and empty bucket cases. The use of runIf for conditional execution is appropriate.


65-69: LGTM! Efficient cache clearing implementation.

The method correctly constructs paths and uses prefix matching to clear relevant cache entries.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/ZarrSegmentIndexFileService.scala (3)

19-54: Well-structured attributes handling!

The JSON parsing correctly handles the nested structure and the lazy hash function initialization is appropriate.


84-95: Consistent implementation with HDF5 service!

The method correctly handles Zarr array reading and maintains consistency with the HDF5 implementation pattern.


147-156: Comprehensive cache clearing!

The method correctly clears both caches and properly filters by data source and layer.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/SegmentIndexFileService.scala (4)

56-79: Well-designed attachment lookup strategy!

The method correctly prioritizes registered attachments and handles URI normalization appropriately.


101-109: Clean format delegation pattern!

The pattern matching on data format provides clear delegation to the appropriate service implementation.


111-116: Correct coordinate transformation logic!

The method properly handles the conversion from mag1 bucket positions to target magnification bucket indices.


222-233: Comprehensive cache clearing across all services!

The method correctly clears the local cache and delegates to both format-specific services.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (3)

48-49: Good refactoring - direct service injection!

Direct injection of segmentIndexFileService and agglomerateService improves code clarity and follows dependency injection best practices.


447-462: Comprehensive cache clearing implementation!

The cache clearing now properly includes segment index file handles, and the logging provides good visibility into what was cleared.


593-736: Excellent migration to key-based segment index access!

All segment index related endpoints have been properly updated to use the new service architecture. The consistent use of segmentIndexFileKey throughout provides better abstraction and supports multiple file formats.

@fm3 fm3 requested a review from MichaelBuessemeyer June 30, 2025 07:57
Copy link
Contributor

@MichaelBuessemeyer MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is my code review, will proceed with testing.

Some comments didn't show up in my IDE after creating them during the review. But I thin all of them are present. At least one of these missing comments appears on the GH review view

.map(_ / Vec3Int.full(DataLayer.bucketLength)) // map positions to cube indices
.distinct

def getSegmentVolume(dataSourceId: DataSourceId,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dataSourceId shouldn't be needed here as it is already a member of segmentIndexFileKey. So I'd remove it from the param list and use segmentIndexFileKey.dataSourceId instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for the other functions below

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure about this one. It’s true that the segmentIndexFileKey contains the correct dataSourceId, but since the segmentIndexFileKey is designed to be used to look up segment index files, it feels a little odd to repurpose it to also carry the dataSourceId that is later used to load voxel data. My feeling is that the additional parameter makes the intent of this function clearer, even if it is somewhat redundant. What do you think?

@MichaelBuessemeyer
Copy link
Contributor

Oh and thanks for being so fast in making all these attachments s3 / zarr compatible 👯

Copy link
Contributor

@MichaelBuessemeyer MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, works great 🎉

I think this can be directly merged after the first iteration 🔥

@fm3
Copy link
Member Author

fm3 commented Jul 2, 2025

Thank you for the detailed review! Good catch about the tryos. I also did some refactoring on the common string constants, please have a quick look at the last commit :) Fortunately I found out that the starting / is not required when passing array names to the hdf5 readers, so I can reuse the strings between hdf5 and zarr implementations without modifications.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/SegmentIndexFileService.scala (2)

69-69: Add missing error message to messages.conf.

The error message key "segmentIndexFile.notFound" referenced here appears to be missing from the messages.conf file, as noted in previous review comments.


115-115: Consider removing redundant dataSourceId parameter.

As discussed in previous comments, the dataSourceId parameter may be redundant since it's already available in segmentIndexFileKey.dataSourceId. While the author provided justification for keeping it for clarity, consider whether the redundancy is worth the potential confusion.

🧹 Nitpick comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/SegmentIndexFileService.scala (1)

33-33: Consider adding documentation for the case class.

The SegmentIndexFileKey case class serves as a key data structure but lacks documentation. Adding a brief comment explaining its purpose would improve code maintainability.

+/** Key for identifying and caching segment index files by data source, layer, and attachment. */
 case class SegmentIndexFileKey(dataSourceId: DataSourceId, layerName: String, attachment: LayerAttachment)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 661b258 and 313838c.

📒 Files selected for processing (5)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/BinaryDataController.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (11 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mapping/AgglomerateService.scala (4 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/SegmentIndexFileService.scala (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/BinaryDataController.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mapping/AgglomerateService.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
🧰 Additional context used
🧠 Learnings (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/SegmentIndexFileService.scala (1)
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: build-smoketest-push
  • GitHub Check: backend-tests
  • GitHub Check: frontend-tests
🔇 Additional comments (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/SegmentIndexFileService.scala (4)

108-113: Mathematical operations for coordinate transformation look correct.

The coordinate transformation logic properly:

  1. Converts bucket indices to positions by scaling with bucket length
  2. Scales positions to target magnification
  3. Converts back to bucket indices at target magnification
  4. Removes duplicates with .distinct

This implementation correctly handles the coordinate space transformations needed for multi-resolution data.


153-173: Bucket request construction logic is well-implemented.

The logic correctly:

  • Scales bucket positions from requested magnification back to mag1 coordinates
  • Constructs proper DataServiceDataRequest objects with correct cuboid dimensions
  • Applies agglomerate settings when needed
  • Uses appropriate data service request settings

The coordinate calculations properly account for the bucket length scaling factor.


204-217: Agglomerate handling logic is robust.

The implementation properly handles both agglomerate and non-agglomerate cases:

  • Checks if the ID is within the valid agglomerate range using largestAgglomerateId
  • Gracefully handles IDs outside the file range (likely created during brushing)
  • Falls back to treating the ID as a segment ID when no agglomerate file is present

This defensive approach prevents errors when dealing with dynamically created agglomerate IDs.


219-230: Cache clearing implementation is comprehensive.

The cache clearing logic:

  • Properly filters cache entries by dataSourceId and optional layerName
  • Delegates to both HDF5 and Zarr services for complete cleanup
  • Returns the total count of cleared entries for monitoring purposes

This ensures all related cached data is properly cleaned up across all format-specific services.

@fm3 fm3 requested a review from MichaelBuessemeyer July 2, 2025 13:28
Copy link
Contributor

@MichaelBuessemeyer MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for all this refactoring 🔥 🔧

The latest commit lgtm and mini testing showed that it is still working 👍

@fm3 fm3 enabled auto-merge (squash) July 3, 2025 09:14
@fm3 fm3 merged commit 4c6b2b2 into master Jul 3, 2025
5 checks passed
@fm3 fm3 deleted the zarr-segment-index branch July 3, 2025 09:18
@coderabbitai coderabbitai bot mentioned this pull request Jul 11, 2025
2 tasks
@coderabbitai coderabbitai bot mentioned this pull request Jul 22, 2025
22 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants