-
Notifications
You must be signed in to change notification settings - Fork 29
Read Zarr Segment Index Files #8711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/Hdf5SegmentIndexFileService.scala (1)
40-63
: Consider improving the error message for unsupported data types.The implementation is correct, but the error message could be more informative about which data types are supported.
Consider this improvement:
- .fromBool(bucketEntriesDtype == "uint16") ?~> "value for dtype_bucket_entries in segment index file is not supported, only uint16 is supported" + .fromBool(bucketEntriesDtype == "uint16") ?~> s"Unsupported dtype_bucket_entries '$bucketEntriesDtype' in segment index file. Only 'uint16' is supported"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
unreleased_changes/8711.md
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
(11 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/SegmentStatistics.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AgglomerateService.scala
(3 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ArrayArtifactHashing.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/Hdf5AgglomerateService.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/SegmentIndexFileService.scala
(0 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/Hdf5MeshFileService.scala
(6 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala
(4 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/ZarrMeshFileService.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/Hdf5SegmentIndexFileService.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/SegmentIndexFileService.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/ZarrSegmentIndexFileService.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/Hdf5FileCache.scala
(4 hunks)webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TSRemoteDatastoreClient.scala
(0 hunks)
💤 Files with no reviewable changes (2)
- webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TSRemoteDatastoreClient.scala
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/SegmentIndexFileService.scala
🧰 Additional context used
🧠 Learnings (10)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/SegmentStatistics.scala (6)
Learnt from: dieknolle3333
PR: scalableminds/webknossos#8168
File: frontend/javascripts/oxalis/model/sagas/proofread_saga.ts:1039-1039
Timestamp: 2024-11-22T17:18:04.217Z
Learning: In `frontend/javascripts/oxalis/model/sagas/proofread_saga.ts`, when calling `getMagInfo`, the use of `volumeTracingLayer.resolutions` is intentional and should not be changed to `volumeTracingLayer.mags`.
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.
Learnt from: dieknolle3333
PR: scalableminds/webknossos#8168
File: frontend/javascripts/oxalis/model/sagas/volumetracing_saga.tsx:433-434
Timestamp: 2024-11-22T17:19:07.947Z
Learning: In the codebase, certain usages of `segmentationLayer.resolutions` are intentionally retained and should not be changed to `segmentationLayer.mags` during refactoring.
Learnt from: MichaelBuessemeyer
PR: scalableminds/webknossos#8352
File: app/models/organization/CreditTransactionService.scala:0-0
Timestamp: 2025-01-27T12:06:42.865Z
Learning: In Scala's for-comprehension with Fox (Future-like type), the `<-` operator ensures sequential execution. If any step fails, the entire chain short-circuits and returns early, preventing subsequent operations from executing. This makes it safe to perform validation checks before database operations.
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/NeuroglancerPrecomputedMeshFileService.scala:161-166
Timestamp: 2025-04-28T14:18:04.368Z
Learning: In Scala for-comprehensions with the Fox error handling monad, `Fox.fromBool()` expressions should use the `<-` binding operator instead of the `=` assignment operator to properly propagate error conditions. Using `=` will cause validation failures to be silently ignored.
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala:170-173
Timestamp: 2025-04-23T08:51:57.756Z
Learning: In the webknossos codebase, classes extending `FoxImplicits` have access to an implicit conversion from `Option[A]` to `Fox[A]`, where `None` is converted to an empty Fox that fails gracefully in for-comprehensions.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/Hdf5MeshFileService.scala (2)
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/NeuroglancerPrecomputedMeshService.scala:55-66
Timestamp: 2025-04-23T09:22:26.829Z
Learning: In the NeuroglancerMesh class, shardingSpecification is defined as a concrete ShardingSpecification value, not an Option. It uses meshInfo.sharding.getOrElse(ShardingSpecification.empty) to provide a default empty specification if none is present, ensuring that mesh.shardingSpecification is never null.
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/NeuroglancerPrecomputedMeshService.scala:55-66
Timestamp: 2025-04-23T09:22:26.829Z
Learning: In the NeuroglancerMesh class, shardingSpecification is defined as a concrete ShardingSpecification value, not an Option. It uses meshInfo.sharding.getOrElse(ShardingSpecification.empty) to provide a default empty specification if none is present, ensuring that mesh.shardingSpecification is never null.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AgglomerateService.scala (1)
Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/ZarrMeshFileService.scala (1)
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala (1)
Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/Hdf5AgglomerateService.scala (1)
Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala (4)
Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/NeuroglancerPrecomputedMeshService.scala:55-66
Timestamp: 2025-04-23T09:22:26.829Z
Learning: In the NeuroglancerMesh class, shardingSpecification is defined as a concrete ShardingSpecification value, not an Option. It uses meshInfo.sharding.getOrElse(ShardingSpecification.empty) to provide a default empty specification if none is present, ensuring that mesh.shardingSpecification is never null.
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/NeuroglancerPrecomputedMeshService.scala:55-66
Timestamp: 2025-04-23T09:22:26.829Z
Learning: In the NeuroglancerMesh class, shardingSpecification is defined as a concrete ShardingSpecification value, not an Option. It uses meshInfo.sharding.getOrElse(ShardingSpecification.empty) to provide a default empty specification if none is present, ensuring that mesh.shardingSpecification is never null.
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/NeuroglancerMeshHelper.scala:143-152
Timestamp: 2025-04-30T08:54:11.252Z
Learning: In Neuroglancer's sharded format, mesh fragment data is located immediately before the manifest file in the same shard data file. The correct way to calculate the mesh data starting offset is to subtract the total mesh size from the manifest starting offset (neuroglancerOffsetStart - totalMeshSize), not by adding the manifest size.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/Hdf5FileCache.scala (1)
Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/SegmentIndexFileService.scala (1)
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (4)
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: conf/evolutions/133-datasource-properties-in-db.sql:8-16
Timestamp: 2025-05-12T14:15:05.259Z
Learning: The database schema in WEBKNOSSOS has separate tables for dataset layers (`dataset_layers`) and magnifications (`dataset_mags`). The `dataFormat` field is stored in the layers table while magnification-specific fields like `cubeLength` (specific to WKW format) are stored in the mags table.
Learnt from: dieknolle3333
PR: scalableminds/webknossos#8168
File: frontend/javascripts/oxalis/model/sagas/volumetracing_saga.tsx:433-434
Timestamp: 2024-11-22T17:19:07.947Z
Learning: In the codebase, certain usages of `segmentationLayer.resolutions` are intentionally retained and should not be changed to `segmentationLayer.mags` during refactoring.
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/NeuroglancerMeshHelper.scala:143-152
Timestamp: 2025-04-30T08:54:11.252Z
Learning: In Neuroglancer's sharded format, mesh fragment data is located immediately before the manifest file in the same shard data file. The correct way to calculate the mesh data starting offset is to subtract the total mesh size from the manifest starting offset (neuroglancerOffsetStart - totalMeshSize), not by adding the manifest size.
🧬 Code Graph Analysis (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AgglomerateService.scala (2)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (1)
listFiles
(101-102)webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/VaultPath.scala (1)
toUri
(79-82)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/Hdf5AgglomerateService.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala (1)
localPath
(43-49)
🪛 LanguageTool
unreleased_changes/8711.md
[duplication] ~4-~4: Possible typo: you repeated a word.
Context: ...t, and from remote object storage. ### Fixed - Fixed a bug where segment index files for seg...
(ENGLISH_WORD_REPEAT_RULE)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: frontend-tests
- GitHub Check: build-smoketest-push
- GitHub Check: backend-tests
🔇 Additional comments (29)
unreleased_changes/8711.md (1)
1-6
: LGTM! Changelog entry is clear and comprehensive.The changelog accurately describes the new Zarr3 format support and the mag1 bug fix. The static analysis hint about word repetition is a false positive - this follows standard changelog format where section headers naturally repeat words used in descriptions.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/SegmentStatistics.scala (1)
38-41
: LGTM! Variable renaming improves clarity.The renaming from
bucketPositionsInMag
tobucketPositionsInRequestedMag
makes the code more explicit about which magnification the bucket positions represent. This change aligns well with the PR's goal of fixing mag1 interpretation issues.webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala (2)
14-18
: LGTM! Proper imports for new segment index services.The imports correctly bring in the three new segment index file service classes that support the new architecture.
46-48
: LGTM! Service bindings follow established patterns.The three segment index file services are properly bound as eager singletons, consistent with other datastore services like mesh file services. This ensures they're available when needed for segment index operations.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/Hdf5MeshFileService.scala (2)
19-19
: LGTM! Cache variable renamed for better clarity.The renaming from
meshFileCache
tofileHandleCache
makes the variable name more specific about what it caches and aligns with naming conventions in other services.
22-156
: LGTM! All cache references updated consistently.All references to the renamed cache variable have been updated consistently throughout the file, ensuring no broken references.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ArrayArtifactHashing.scala (1)
7-16
: LGTM! Trait renaming improves format-agnostic design.The renaming from
Hdf5HashedArrayUtils
toArrayArtifactHashing
makes the trait more generic and suitable for use with both HDF5 and Zarr services. The simplified interface focusing only on hash function provision is clean and extensible.webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AgglomerateService.scala (1)
36-36
: LGTM: Consistent directory naming convention.The renaming from
agglomerateDir
tolocalAgglomeratesDir
improves clarity and aligns with the naming conventions used in other services likeMeshFileService
. The change is applied consistently throughout the file.Also applies to: 49-49, 101-101
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/ZarrMeshFileService.scala (1)
11-11
: LGTM: Trait refactoring improves naming clarity.The replacement of
Hdf5HashedArrayUtils
withArrayArtifactHashing
is a sensible rename since the hashing functionality isn't specific to HDF5 files. ThegetHashFunction
method usage remains unchanged, indicating the refactoring maintains backward compatibility.Also applies to: 28-28
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala (2)
41-41
: LGTM: Added credential support for remote attachments.The addition of
credentialId
parameter with a default value maintains backward compatibility while enabling support for remote attachments that require authentication.
42-50
: Good centralization of path validation logic.The
localPath
method effectively centralizes URI-to-Path conversion and validation that was previously scattered across services. The warning comment correctly indicates that this method throws exceptions and should be used withintryo
blocks for safe error handling.The validation logic ensures only local file system paths are supported for HDF5 files, which is a reasonable constraint given the current implementation.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/Hdf5AgglomerateService.scala (1)
46-47
: LGTM: Simplified file opening using centralized validation.The
openHdf5
method is now cleaner by leveraging thelocalPath
method fromLayerAttachment
, which centralizes the URI scheme validation. This eliminates code duplication while maintaining the same validation behavior.webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala (2)
14-14
: LGTM: Consistent trait refactoring.The replacement of
Hdf5HashedArrayUtils
withArrayArtifactHashing
aligns with the same refactoring applied across other services, improving naming clarity since the hashing functionality isn't HDF5-specific.Also applies to: 73-73
76-77
: LGTM: Improved maintainability with named constants.The introduction of
localMeshesDir
andhdf5MeshFileExtension
constants eliminates magic strings and improves maintainability. This follows the same pattern established inAgglomerateService
and ensures consistency across the codebase.Also applies to: 103-103, 123-123
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/Hdf5FileCache.scala (3)
13-17
: LGTM! Good error handling improvements.The addition of Box utilities and the trait rename to
ArrayArtifactHashing
are consistent with the codebase-wide refactoring for unified artifact hashing.Also applies to: 26-26
103-107
: Good error handling pattern!The implementation safely extracts the local path and maintains consistent Box-based error handling.
62-65
: Fix return type inconsistency.The method signature indicates it returns
Box[CachedHdf5File]
, but the implementation returns justCachedHdf5File
from the yield expression. The existinggetCachedHdf5File(Path)
method returnsCachedHdf5File
directly, not wrapped in a Box.Apply this fix:
- def getCachedHdf5File(attachment: LayerAttachment)(loadFn: Path => CachedHdf5File): Box[CachedHdf5File] = - for { - localPath <- tryo(attachment.localPath) - } yield getCachedHdf5File(localPath)(loadFn) + def getCachedHdf5File(attachment: LayerAttachment)(loadFn: Path => CachedHdf5File): Box[CachedHdf5File] = + tryo(attachment.localPath).map(localPath => getCachedHdf5File(localPath)(loadFn))Likely an incorrect or invalid review comment.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/Hdf5SegmentIndexFileService.scala (2)
20-38
: Well-structured segment index reading implementation!The method correctly handles bucket lookup and empty bucket cases. The use of
runIf
for conditional execution is appropriate.
65-69
: LGTM! Efficient cache clearing implementation.The method correctly constructs paths and uses prefix matching to clear relevant cache entries.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/ZarrSegmentIndexFileService.scala (3)
19-54
: Well-structured attributes handling!The JSON parsing correctly handles the nested structure and the lazy hash function initialization is appropriate.
84-95
: Consistent implementation with HDF5 service!The method correctly handles Zarr array reading and maintains consistency with the HDF5 implementation pattern.
147-156
: Comprehensive cache clearing!The method correctly clears both caches and properly filters by data source and layer.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/SegmentIndexFileService.scala (4)
56-79
: Well-designed attachment lookup strategy!The method correctly prioritizes registered attachments and handles URI normalization appropriately.
101-109
: Clean format delegation pattern!The pattern matching on data format provides clear delegation to the appropriate service implementation.
111-116
: Correct coordinate transformation logic!The method properly handles the conversion from mag1 bucket positions to target magnification bucket indices.
222-233
: Comprehensive cache clearing across all services!The method correctly clears the local cache and delegates to both format-specific services.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (3)
48-49
: Good refactoring - direct service injection!Direct injection of
segmentIndexFileService
andagglomerateService
improves code clarity and follows dependency injection best practices.
447-462
: Comprehensive cache clearing implementation!The cache clearing now properly includes segment index file handles, and the logging provides good visibility into what was cleared.
593-736
: Excellent migration to key-based segment index access!All segment index related endpoints have been properly updated to use the new service architecture. The consistent use of
segmentIndexFileKey
throughout provides better abstraction and supports multiple file formats.
...m/scalableminds/webknossos/datastore/services/segmentindex/ZarrSegmentIndexFileService.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is my code review, will proceed with testing.
Some comments didn't show up in my IDE after creating them during the review. But I thin all of them are present. At least one of these missing comments appears on the GH review view
...e/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala
Outdated
Show resolved
Hide resolved
...s-datastore/app/com/scalableminds/webknossos/datastore/services/Hdf5AgglomerateService.scala
Show resolved
Hide resolved
.map(_ / Vec3Int.full(DataLayer.bucketLength)) // map positions to cube indices | ||
.distinct | ||
|
||
def getSegmentVolume(dataSourceId: DataSourceId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dataSourceId
shouldn't be needed here as it is already a member of segmentIndexFileKey
. So I'd remove it from the param list and use segmentIndexFileKey.dataSourceId
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for the other functions below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not sure about this one. It’s true that the segmentIndexFileKey contains the correct dataSourceId, but since the segmentIndexFileKey is designed to be used to look up segment index files, it feels a little odd to repurpose it to also carry the dataSourceId that is later used to load voxel data. My feeling is that the additional parameter makes the intent of this function clearer, even if it is somewhat redundant. What do you think?
...p/com/scalableminds/webknossos/datastore/services/segmentindex/SegmentIndexFileService.scala
Outdated
Show resolved
Hide resolved
...p/com/scalableminds/webknossos/datastore/services/segmentindex/SegmentIndexFileService.scala
Show resolved
Hide resolved
...m/scalableminds/webknossos/datastore/services/segmentindex/Hdf5SegmentIndexFileService.scala
Outdated
Show resolved
Hide resolved
...m/scalableminds/webknossos/datastore/services/segmentindex/ZarrSegmentIndexFileService.scala
Outdated
Show resolved
Hide resolved
...m/scalableminds/webknossos/datastore/services/segmentindex/ZarrSegmentIndexFileService.scala
Show resolved
Hide resolved
...sos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TSRemoteDatastoreClient.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, works great 🎉
I think this can be directly merged after the first iteration 🔥
Thank you for the detailed review! Good catch about the tryos. I also did some refactoring on the common string constants, please have a quick look at the last commit :) Fortunately I found out that the starting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/SegmentIndexFileService.scala (2)
69-69
: Add missing error message to messages.conf.The error message key "segmentIndexFile.notFound" referenced here appears to be missing from the messages.conf file, as noted in previous review comments.
115-115
: Consider removing redundant dataSourceId parameter.As discussed in previous comments, the
dataSourceId
parameter may be redundant since it's already available insegmentIndexFileKey.dataSourceId
. While the author provided justification for keeping it for clarity, consider whether the redundancy is worth the potential confusion.
🧹 Nitpick comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/SegmentIndexFileService.scala (1)
33-33
: Consider adding documentation for the case class.The
SegmentIndexFileKey
case class serves as a key data structure but lacks documentation. Adding a brief comment explaining its purpose would improve code maintainability.+/** Key for identifying and caching segment index files by data source, layer, and attachment. */ case class SegmentIndexFileKey(dataSourceId: DataSourceId, layerName: String, attachment: LayerAttachment)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/BinaryDataController.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
(11 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mapping/AgglomerateService.scala
(4 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/SegmentIndexFileService.scala
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/BinaryDataController.scala
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mapping/AgglomerateService.scala
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
🧰 Additional context used
🧠 Learnings (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/SegmentIndexFileService.scala (1)
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: build-smoketest-push
- GitHub Check: backend-tests
- GitHub Check: frontend-tests
🔇 Additional comments (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/SegmentIndexFileService.scala (4)
108-113
: Mathematical operations for coordinate transformation look correct.The coordinate transformation logic properly:
- Converts bucket indices to positions by scaling with bucket length
- Scales positions to target magnification
- Converts back to bucket indices at target magnification
- Removes duplicates with
.distinct
This implementation correctly handles the coordinate space transformations needed for multi-resolution data.
153-173
: Bucket request construction logic is well-implemented.The logic correctly:
- Scales bucket positions from requested magnification back to mag1 coordinates
- Constructs proper
DataServiceDataRequest
objects with correct cuboid dimensions- Applies agglomerate settings when needed
- Uses appropriate data service request settings
The coordinate calculations properly account for the bucket length scaling factor.
204-217
: Agglomerate handling logic is robust.The implementation properly handles both agglomerate and non-agglomerate cases:
- Checks if the ID is within the valid agglomerate range using
largestAgglomerateId
- Gracefully handles IDs outside the file range (likely created during brushing)
- Falls back to treating the ID as a segment ID when no agglomerate file is present
This defensive approach prevents errors when dealing with dynamically created agglomerate IDs.
219-230
: Cache clearing implementation is comprehensive.The cache clearing logic:
- Properly filters cache entries by dataSourceId and optional layerName
- Delegates to both HDF5 and Zarr services for complete cleanup
- Returns the total count of cleared entries for monitoring purposes
This ensures all related cached data is properly cleaned up across all format-specific services.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks for all this refactoring 🔥 🔧
The latest commit lgtm and mini testing showed that it is still working 👍
Steps to test:
TODOs:
Issues:
$PR_NUMBER.md
file inunreleased_changes
or use./tools/create-changelog-entry.py
)