-
Notifications
You must be signed in to change notification settings - Fork 29
Read Zarr Agglomerate Files #8633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughThis update introduces support for reading agglomerate mappings from the zarr3 format, including remote object storage. It refactors agglomerate-related services for modularity and multi-format support, updates type signatures for large-dimension compatibility, and adjusts controller and service logic to use repository lookups and new data types. Several new services and helper methods are added. Changes
Possibly related PRs
Suggested labels
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (3)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all your work 🙏.The code looks pretty good to me. I only found a few suggestion / questions.
The only question stretching over multiple files is: Why did you change the type to Long
for some arguments / members of DatasetArray
(I think it was)?
Besides this, I'll leave testing for tomorrow :)
...-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
Outdated
Show resolved
Hide resolved
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/DatasetArray.scala
Show resolved
Hide resolved
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/DatasetArray.scala
Outdated
Show resolved
Hide resolved
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/DatasetArray.scala
Show resolved
Hide resolved
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/DatasetArray.scala
Show resolved
Hide resolved
...ossos-datastore/app/com/scalableminds/webknossos/datastore/services/AgglomerateService.scala
Outdated
Show resolved
Hide resolved
...ossos-datastore/app/com/scalableminds/webknossos/datastore/services/AgglomerateService.scala
Outdated
Show resolved
Hide resolved
...ossos-datastore/app/com/scalableminds/webknossos/datastore/services/AgglomerateService.scala
Outdated
Show resolved
Hide resolved
...s-datastore/app/com/scalableminds/webknossos/datastore/services/ZarrAgglomerateService.scala
Show resolved
Hide resolved
...s-datastore/app/com/scalableminds/webknossos/datastore/services/ZarrAgglomerateService.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AgglomerateService.scala (1)
36-36
: Variable name still implies “only‐HDF5” support
hdf5AgglomerateFileExtension
suggests this is the only supported agglomerate format, which is no longer true. Consider renaming to something likelegacyHdf5AgglomerateFileExtension
or at leastagglomerateHdf5FileExtension
to avoid confusion.
🧹 Nitpick comments (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AgglomerateService.scala (1)
198-200
:unsupportedDataFormat
does not use theExecutionContext
unsupportedDataFormat
receives an implicitExecutionContext
that it never uses. Drop the parameter to avoid accidental shadowing and to make the signature cleaner.webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/DatasetArray.scala (2)
304-307
: GuardLong → Int
cast against out-of-range values.
(globalOffset(dim) - (chunkIndex(dim).toLong * chunkShape(dim).toLong)).toInt
assumes the intra-chunk offset always fits intoInt
. While logically true, an out-of-range result would truncate without warning.Add an explicit range assertion before the cast:
val diff = globalOffset(dim) - chunkIndex(dim).toLong * chunkShape(dim) require(diff >= Int.MinValue && diff <= Int.MaxValue, s"Offset $diff overflows Int for dim=$dim") diff.toInt
310-312
:toString
truncates 64-bit shapes.
printAsInner(s.map(_.toInt))
narrows potentially huge dimensions toInt
, leading to negative or wrong values in logs.Introduce a
printAsInner(values: Array[Long])
overload (or call.mkString
) to preserve accuracy.- shape=${header.datasetShape.map(s => printAsInner(s.map(_.toInt)))} + shape=${header.datasetShape.map(_.mkString("inner(", ",", ")"))}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
CHANGELOG.unreleased.md
(1 hunks)conf/messages
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
(9 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/DatasetArray.scala
(8 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Zarr3Array.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AgglomerateService.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ZarrAgglomerateService.scala
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- conf/messages
🚧 Files skipped from review as they are similar to previous changes (5)
- CHANGELOG.unreleased.md
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Zarr3Array.scala
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ZarrAgglomerateService.scala
🧰 Additional context used
🧠 Learnings (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/DatasetArray.scala (1)
Learnt from: MichaelBuessemeyer
PR: scalableminds/webknossos#8352
File: app/models/organization/CreditTransactionService.scala:0-0
Timestamp: 2025-01-27T12:06:42.865Z
Learning: In Scala's for-comprehension with Fox (Future-like type), the `<-` operator ensures sequential execution. If any step fails, the entire chain short-circuits and returns early, preventing subsequent operations from executing. This makes it safe to perform validation checks before database operations.
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: frontend-tests
- GitHub Check: backend-tests
- GitHub Check: build-smoketest-push
🔇 Additional comments (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AgglomerateService.scala (1)
45-58
: Listing local files may still fail (or be slow) for remote layers
PathUtils.listFiles
is invoked unconditionally on a path that can be non-existent (remote dataset) or reside on network storage. While the silent flag prevents exceptions, the call can still block and degrade request latency.Please verify whether
listAgglomerates
is executed on the request path and, if so, guard the listing withFiles.exists(layerDir)
(or a cheap metadata check) to avoid unnecessary I/O for remote layers.webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/DatasetArray.scala (3)
157-161
: MixedInt
/Long
arithmetic – verify no silent narrowing.
totalOffset.map(_.toLong)
feeds aSeq[Long]
intocomputeChunkIndices
, but theshape
argument remainsArray[Int]
.
Double-check thatcomputeChunkIndices
internally liftsshape
toLong
before doing multiplications; otherwise offsets ≥ 2³¹ will silently overflow.
296-298
: Nice micro-cleanup on zero-offset check.Replacing
Arrays.equals
with the cleareroffset.forall(_ == 0)
eliminates allocation and improves readability. 👍
51-55
: ```shell
#!/bin/bash
set -eecho "=== Find permuteIndicesArrayToWkLong ==="
rg -n "permuteIndicesArrayToWkLong" -A2 webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/AxisOrder.scala</details> </blockquote></details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
...ossos-datastore/app/com/scalableminds/webknossos/datastore/services/AgglomerateService.scala
Show resolved
Hide resolved
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/DatasetArray.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/DatasetArray.scala (1)
310-312
: AvoidtoInt
when logging very large shapes
toString
convertsLong
dimensions toInt
, which will print negative numbers for shapes > 2 147 483 647.- shape=${header.datasetShape.map(s => printAsInner(s.map(_.toInt)))} + shape=${header.datasetShape.map(s => printAsInner(s.map(_.toString.toIntOption.getOrElse(Int.MaxValue))))}Or simply build the string from
Long
s directly to prevent misleading logs.webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AgglomerateService.scala (1)
198-200
: Remove unused implicitExecutionContext
unsupportedDataFormat
does not use the passed execution context, yet requires one implicitly, forcing callers to have it in scope.- private def unsupportedDataFormat(agglomerateFileKey: AgglomerateFileKey)(implicit ec: ExecutionContext) = + private def unsupportedDataFormat(agglomerateFileKey: AgglomerateFileKey) = Fox.failure( s"Trying to load agglomerate file with unsupported data format ${agglomerateFileKey.attachment.dataFormat}")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/DatasetArray.scala
(8 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AgglomerateService.scala
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/DatasetArray.scala (1)
Learnt from: MichaelBuessemeyer
PR: scalableminds/webknossos#8352
File: app/models/organization/CreditTransactionService.scala:0-0
Timestamp: 2025-01-27T12:06:42.865Z
Learning: In Scala's for-comprehension with Fox (Future-like type), the `<-` operator ensures sequential execution. If any step fails, the entire chain short-circuits and returns early, preventing subsequent operations from executing. This makes it safe to perform validation checks before database operations.
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: build-smoketest-push
- GitHub Check: frontend-tests
- GitHub Check: backend-tests
🔇 Additional comments (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/DatasetArray.scala (5)
189-199
: MixedLong
/Int
API – please double-check
readAsMultiArray(offset: Array[Long], shape: Array[Int])
mixes numeric widths. Offsets > Int.MaxValue are expected, so why isshape
stillInt
while the very next call hands it tocomputeChunkIndices
, which already takesLong
?
Either
- keep both as
Long
for consistency, or- document the invariant that individual chunk shapes always fit into an
Int
.This prevents subtle truncation bugs when somebody passes a shape bigger than 2 147 483 647 by mistake.
269-277
: Fast-path predicate can give false positives with large offsets
partialCopyingIsNotNeededForMultiArray
callscomputeOffsetInChunkIgnoringAxisOrder
, which currently truncates toInt
. If overflow occurs,isZeroOffset
may erroneously returntrue
, skipping the partial-copy logic and returning a wrong slice.Once the overflow in
computeOffsetInChunkIgnoringAxisOrder
is fixed (see above), please re-run the tests to verify this branch still behaves as intended.
297-297
: Simplified zero-offset check looks goodSwitching to
offset.forall(_ == 0)
is clearer and avoids potential overflow onproduct
.
51-55
:datasetShape
nowArray[Long]
– ensure all downstream helpers are updatedThe value feeds into:
fullAxisOrder.permuteIndicesArrayToWkLong
(nice!), andcomputeChunkIndices
.Please verify that every overload of these helpers now accepts
Long
; otherwise an implicit down-cast will re-introduce the very size limitation this PR tries to remove.
156-162
: Compile-time check: new helperpermuteIndicesArrayToWkLong
must existThe call to
fullAxisOrder.permuteIndicesArrayToWkLongassumes a freshly added overload. Please ensure it is defined for all
FullAxisOrder
implementations; otherwise this file will not compile.webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AgglomerateService.scala (1)
42-59
: ```bash
#!/usr/bin/env bash
set -eLocate the PathUtils source file
PATHUTILS_FILE=$(rg -l "^object PathUtils" | head -n1)
if [[ -z "$PATHUTILS_FILE" ]]; then
echo "❌ PathUtils object not found."
exit 1
fi
echo "🗂 Found PathUtils at: $PATHUTILS_FILE"Show definitions for listFiles, listDirs, and fileExtensionFilter
echo -e "\n🔍 Checking for listFiles definition:"
rg -n "def listFiles" "$PATHUTILS_FILE" || echo "⚠️ listFiles not defined here"echo -e "\n🔍 Checking for listDirs definition:"
rg -n "def listDirs" "$PATHUTILS_FILE" || echo "⚠️ listDirs not defined here"echo -e "\n🔍 Checking for fileExtensionFilter:"
rg -n "fileExtensionFilter" "$PATHUTILS_FILE" || echo "⚠️ fileExtensionFilter not found"Search for any usage of listDirs across the repo
echo -e "\n🔎 Global search for listDirs usage:"
rg -n "listDirs" || echo "⚠️ No usages of listDirs in the repo"</details> </blockquote></details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/DatasetArray.scala
Show resolved
Hide resolved
...ossos-datastore/app/com/scalableminds/webknossos/datastore/services/AgglomerateService.scala
Show resolved
Hide resolved
...ossos-datastore/app/com/scalableminds/webknossos/datastore/services/AgglomerateService.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome thanks for applying my feedback. I left some reply on open conversions.
Will test now
}) | ||
for { | ||
_ <- copiedFuture | ||
} yield targetMultiArray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, ok. I am not that much of a fan but I am also fine with your decision
bind(classOf[ZarrAgglomerateService]).asEagerSingleton() | ||
bind(classOf[Hdf5AgglomerateService]).asEagerSingleton() | ||
bind(classOf[AgglomerateService]).asEagerSingleton() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is necessary as the BinaryDataServiceHolder
is already a singleton and does only create one of each manually. So these three classes should therefore also be singletons.
But having this might also make sure that no strange things happen. So might keep this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is acutally making the code not compile as there is no injectable constructor for AgglomerateService
. I deleted it locally to proceed with testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You’re right, I confused myself here. In the next iteration (MeshFileService) I make them injectable again. I’ll move this change there.
NodeDefaults.createInstance.copy( | ||
id = idx + nodeIdStartAtOneOffset, | ||
position = Vec3IntProto(pos(0).toInt, pos(1).toInt, pos(2).toInt) | ||
def lookUpAgglomerateFile(dataSourceId: DataSourceId, dataLayer: DataLayer, mappingName: String)( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I see your point and it is valid. I only wanted to avoid confusion as this makes the code read:
agglomerateFileKey <- agglomerateService.lookUpAgglomerateFile(...)
and not
agglomerateFileKey <- agglomerateService.lookUpAgglomerateFileKey(...)
,
And to me the first reads like a logical mismatch on the first glance as on a conceptual level a "file key" and a "file" are different things. Maybe you mean something like "file handle" or so? 🤔
If you still want to keep lookUpAgglomerateFile
feel free to do so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly, testing did not fully work
When I tried to merge 2 segments in an annotation with agglomerate view 5 turned on, I ended up with a broken annotation where I could no longer load agglomerate meshes.
These are the errors in the console:
And here is the frontend log of actions I performed:
Start new bucket watcher for layer af681321-9141-4c88-93c1-3786f947029b
Don't load mesh for segment 32 because it already exists.
Cancel old bucket watcher
Start new bucket watcher for layer af681321-9141-4c88-93c1-3786f947029b
Accessing mapping for proofreading
dispatch setMappingAction in proofreading saga
Accessing mapping for proofreading
Merging agglomerate 211 with 5 and segment ids 226 172
And the backend error causing this is I think:
backend: 2025-06-18 18:46:59,772 [error] com.scalableminds.webknossos.datastore.rpc.RPCRequest - Failed RPC 313: GET http://localhost:9000/data/datasets/sample_organization/test-agglomerate-file-zarr/layers/segmentation/agglomerates/agglomerate_view_5/agglomerateGraph/211, Status: 400.() RequestBody: '' ResponseBody: '{"messages":[{"error":"agglomerateGraph.failed"},{"chain":"[Server Time 2025-06-18T16:46:59.771Z] class [I cannot be cast to class [F ([I and [F are in module java.base of loader 'bootstrap') java.lang.ClassCastException: class [I cannot be cast to class [F ([I and [F are in module java.base of loader 'bootstrap')"}]}'
bind(classOf[ZarrAgglomerateService]).asEagerSingleton() | ||
bind(classOf[Hdf5AgglomerateService]).asEagerSingleton() | ||
bind(classOf[AgglomerateService]).asEagerSingleton() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is acutally making the code not compile as there is no injectable constructor for AgglomerateService
. I deleted it locally to proceed with testing
Thanks so much for testing and the detailed steps to reproduce! This bug should be fixed now. I also re-tested the position query (where previously we requested shape 3,1 instead of 1,3. That seems to work now too. Also, you convinced me to rename the function to Please have another look :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, the latest changes are looking good.
Couldn't find any bugs anymore 🚫 🐛
Should be good to go 🐎
- Similar to #8633 we now support reading mesh files in zarr format. Note that there are also neuroglancerPrecomputed meshes, so this time there are three services the meshFileService delegates to. - The frontend no longer needs to specially handle neuroglancerPrecomputed meshes, since the lookUpMeshFileKey function abstracts from that. Because of that, I also simplified the frontend types. - Note that zarr meshfiles must be registered in the datasource-properties.json, they are not explored. - Also fixed a small error in job result status reporting - Also fixed a small error in mag selection for animation job. ### Steps to test: - Load some precomputed meshes (both zarr, hdf5, neuroglancerPrecomputed; talk to me for example datasets), should show meshes correctly - Create animations via wk worker, should also work correctly (I tested this locally) - Note that neuroglancerPrecomputed meshes now need to be registered as attachments in the datasource-properties.json. You may need to re-explore the relevant datasets or edit the json. ### TODOs: - [x] introduce and look up MeshFileKey - [x] delegate to zarr, hdf5 or neuroglancer meshfile services - [x] explore remote neuroglancer meshes - [x] parse new meshfile attributes zarr group format - [x] do we still need the null vs none meshfile-mapping attribute? - [x] test with frontend (zarr,hdf5,neuro) - [x] adapt frontend to no longer need path + fileType for meshfiles - [x] clear caches - [x] unify fullMeshStl route - [x] unify spelling (meshfile vs meshFile) - [x] When exploring remote neuroglancer meshes, add credentialId, and use it? (or create follow-up issue) - [x] re-add singletons to module after merge of zarr-agglomerate’s latest changes - [x] unify function names with zarr-agglomerate’s latest changes ### Issues: - contributes to #8618 - contributes to #8567 ------ - [x] Updated [changelog](../blob/master/CHANGELOG.unreleased.md#unreleased) - [x] Considered [common edge cases](../blob/master/.github/common_edge_cases.md) - [x] Needs datastore update after deployment --------- Co-authored-by: MichaelBuessemeyer <[email protected]>
Agglomerate Mappings can now also be read from the new zarr3-based format, and from remote object storage.
AgglomerateService is no longer injected, instead it is explicitly created in BinaryDataServiceHolder so we can pass it the sharedChunkContentsCache
URL of deployed dev instance (used for testing):
Steps to test:
TODOs:
Backend
Frontend
Issues: