Virtual Remote Datasets #8657

frcroth · 2025-05-26T14:26:34Z

URL of deployed dev instance (used for testing):

https://___.webknossos.xyz (No frontend)

This PR adds routes in the datastore to access datasets by ID as opposed to by organizationId and directory. Datasources are cached in the datastore, the configuration is generated by WK from the DB. WK can also invalidate the DS cache when the dataset was changed. This PR only supports remote datasets, and especially no WKW datasets.

This adds new routes with the "wkDataset" path, since the dataset routes directly follow with the organizationId, making routes that use the datasetId there ambiguous. ~~Zarr routes are also duplicated to use dataset ids~~. In a followup, once wkw datasets are also supported, the old routes should be removed and the new ones can be renamed to have nicer sounding routes.

Steps to test:

In the two changed frontend files (rest_api.ts and wkstore_adapter.ts) change the datasetId to a remote dataset you have in your WK.
Open a dataset (preferably the same dataset so that metadata matches) and see data, histogram.
Explore a remote dataset as a zarr dataset (e.g. exploring http://localhost:9000/data/wkZarr/683d9643f600008d0e26a0a3)

TODOs:

Cache invalidation
Remove todos
Revert frontend changes

Issues:

Contributes to Virtual datasets #8384

(Please delete unneeded items, merge only when none are left open)

~~Updated changelog~~ No user facing changes for now
~~Adapted wk-libs python client if relevant API parts change~~ API only appended right now
Removed dev-only changes like prints and application.conf edits
Considered common edge cases
Needs datastore update after deployment

coderabbitai · 2025-05-26T14:26:40Z

📝 Walkthrough

Walkthrough

This update introduces enhanced dataset access control, new dataset cache and attachment management, and expanded API endpoints for dataset data access and cache invalidation. It adds support for dataset layer attachments, detailed data source representations, and integrates these features into both the main application and the datastore service, with corresponding updates to public APIs and access token handling.

Changes

Files/Paths	Change Summary
app/controllers/UserTokenController.scala, webknossos-datastore/app/com/scalableminds/.../AccessTokenService.scala	Added dataset-specific access control methods and resource type; updated access validation to support datasets.
app/controllers/WKRemoteDataStoreController.scala, conf/webknossos.latest.routes	Added API endpoint and controller action to fetch a datasource by dataset ID with access validation.
app/models/dataset/Dataset.scala	Added support for dataset mag parsing, dataset layer attachments, and extended data layer metadata; introduced new DAO for layer attachments.
app/models/dataset/DatasetService.scala	Introduced method for detailed data source representation with mags; added datastore update notification after data source changes.
app/models/dataset/WKRemoteDataStoreClient.scala, webknossos-datastore/app/com/scalableminds/.../DSRemoteWebknossosClient.scala	Added methods for updating dataset cache and fetching datasets from remote webknossos.
app/models/annotation/AnnotationService.scala, webknossos-datastore/app/com/scalableminds/.../AnnotationSource.scala	Added datasetId field to AnnotationSource and serialization logic.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DatasetCache.scala	Introduced DatasetCache class for caching and retrieving datasets and layers, with manual invalidation.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/WKDatasetController.scala	Added new controller for dataset data access, supporting voxel data, thumbnails, histograms, and more, with access validation.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala	Added cache invalidation endpoint and injected DatasetCache; minor code cleanup.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala	Added isEmpty method to DatasetLayerAttachments.
webknossos-datastore/conf/com.scalableminds.webknossos.datastore.routes	Added new REST API routes for dataset data access and cache invalidation.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala	Bound DatasetCache as an eager singleton in DI module.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/BinaryDataController.scala	Removed trailing comma in constructor parameter list (formatting only).

Suggested labels

backend, new feature

Suggested reviewers

fm3

Poem

🐇 In data fields where pixels play,
New caches bloom to save the day.
Layers attach with mags in line,
Access guarded, routes align.
A hop through code, a joyful cheer,
Dataset magic, crystal clear!
🥕✨ Hoppity-hop, the future’s here!

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 16dfb39 and 89f0eb8.

📒 Files selected for processing (1)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DatasetCache.scala (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DatasetCache.scala

⏰ Context from checks skipped due to timeout of 90000ms (3)

GitHub Check: build-smoketest-push
GitHub Check: frontend-tests
GitHub Check: backend-tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (5)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)
778-784: Consider adding error handling for cache update operations.

The method correctly validates write access before updating the cache, but the cache update operation could potentially fail. Consider wrapping the cache update in a Fox for proper error handling.
 def updateCache(datasetId: String): Action[DataSource] =
   Action.async(validateJson[DataSource]) { implicit request =>
     accessTokenService.validateAccessFromTokenContext(UserAccessRequest.writeDataset(datasetId)) {
-      datasetCache.updateById(datasetId, request.body)
-      Future.successful(Ok)
+      for {
+        _ <- datasetCache.updateById(datasetId, request.body)
+      } yield Ok
     }
   }
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DatasetCache.scala (1)

17-17: Consider adding cache configuration

The TemporaryStore cache has no visible configuration for TTL, size limits, or eviction policies. This could lead to unbounded memory usage in production.

Consider adding configuration parameters for:

Maximum cache size

Time-to-live (TTL) for cached entries

Eviction policy when cache is full
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/WKDatasetController.scala (1)
75-101: Consider using a request object to reduce parameter count.

The method has 10 parameters which makes it error-prone to call. Consider creating a dedicated request case class to encapsulate these parameters, similar to how RawCuboidRequest is used in the POST variant.

Example structure:
case class RawCuboidGetRequest(
  x: Int, y: Int, z: Int,
  width: Int, height: Int, depth: Int,
  mag: String,
  halfByte: Boolean = false,
  mappingName: Option[String] = None
)
app/models/dataset/Dataset.scala (2)
851-866: Consider adding error context for axis order parsing failures.

While the current implementation safely handles parsing failures by returning None, it silently discards parsing errors. Consider logging warnings when axis order JSON parsing fails to help with debugging data issues.
-      axisOrderParsed = row.axisorder match {
-        case Some(axisOrder) => JsonHelper.parseAs[AxisOrder](axisOrder).toOption
-        case None            => None
-      }
+      axisOrderParsed = row.axisorder match {
+        case Some(axisOrder) => JsonHelper.parseAs[AxisOrder](axisOrder) match {
+          case Full(parsed) => Some(parsed)
+          case failure =>
+            logger.warn(s"Failed to parse axis order for dataset ${row._Dataset}, layer ${row.datalayername}: $failure")
+            None
+        }
+        case None => None
+      }
1066-1082: Consider using parallel processing for attachment parsing.

The current implementation uses Fox.serialCombined which processes attachments sequentially. Since these are independent operations, consider using Fox.combined for better performance when dealing with many attachments.
   def parseAttachments(rows: List[DatasetLayerAttachmentsRow]): Fox[AttachmentWrapper] =
     for {
-      meshFiles <- Fox.serialCombined(rows.filter(_.`type` == LayerAttachmentType.mesh.toString))(parseRow)
-      agglomerateFiles <- Fox.serialCombined(rows.filter(_.`type` == LayerAttachmentType.agglomerate.toString))(
-        parseRow)
-      connectomeFiles <- Fox.serialCombined(rows.filter(_.`type` == LayerAttachmentType.connectome.toString))(parseRow)
-      segmentIndexFiles <- Fox.serialCombined(rows.filter(_.`type` == LayerAttachmentType.segmentIndex.toString))(
-        parseRow)
-      cumsumFiles <- Fox.serialCombined(rows.filter(_.`type` == LayerAttachmentType.cumsum.toString))(parseRow)
+      meshFiles <- Fox.combined(rows.filter(_.`type` == LayerAttachmentType.mesh.toString).map(parseRow))
+      agglomerateFiles <- Fox.combined(rows.filter(_.`type` == LayerAttachmentType.agglomerate.toString).map(parseRow))
+      connectomeFiles <- Fox.combined(rows.filter(_.`type` == LayerAttachmentType.connectome.toString).map(parseRow))
+      segmentIndexFiles <- Fox.combined(rows.filter(_.`type` == LayerAttachmentType.segmentIndex.toString).map(parseRow))
+      cumsumFiles <- Fox.combined(rows.filter(_.`type` == LayerAttachmentType.cumsum.toString).map(parseRow))
     } yield

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a410e8f and e64fd7b.

📒 Files selected for processing (17)

app/controllers/UserTokenController.scala (2 hunks)
app/controllers/WKRemoteDataStoreController.scala (1 hunks)
app/models/dataset/Dataset.scala (10 hunks)
app/models/dataset/DatasetService.scala (4 hunks)
app/models/dataset/WKRemoteDataStoreClient.scala (2 hunks)
conf/webknossos.latest.routes (1 hunks)
frontend/javascripts/admin/rest_api.ts (1 hunks)
frontend/javascripts/viewer/model/bucket_data_handling/wkstore_adapter.ts (1 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala (1 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/BinaryDataController.scala (1 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (4 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/WKDatasetController.scala (1 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala (3 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AccessTokenService.scala (2 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DatasetCache.scala (1 hunks)
webknossos-datastore/conf/com.scalableminds.webknossos.datastore.routes (2 hunks)

🧰 Additional context used

🧠 Learnings (1)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala (1)

Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.030Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.

🧬 Code Graph Analysis (2)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AccessTokenService.scala (3)

app/models/job/Job.scala (1)

datasetId (55-55)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (2)

DataSourceId (10-12)

DataSourceId (14-31)

app/models/dataset/Dataset.scala (1)

dataSourceId (92-92)

app/models/dataset/Dataset.scala (5)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/AxisOrder.scala (3)

AxisOrder (8-23)

AxisOrder (25-46)

toString (53-53)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)

DataFormat (30-32)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala (7)

DatasetLayerAttachments (12-21)

DatasetLayerAttachments (23-25)

LayerAttachment (37-37)

LayerAttachment (39-60)

LayerAttachmentDataFormat (27-30)

LayerAttachmentType (32-35)

isEmpty (19-21)

util/src/main/scala/com/scalableminds/util/tools/JsonHelper.scala (1)

JsonHelper (17-112)

util/src/main/scala/com/scalableminds/util/tools/Fox.scala (1)

combined (84-96)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: backend-tests

🔇 Additional comments (25)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/BinaryDataController.scala (1)

41-41: Approve removal of trailing comma.

This is a purely stylistic cleanup in the constructor parameter list and does not affect any logic or behavior.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala (1)

37-37: LGTM! Proper dependency injection setup.

The DatasetCache binding follows the established pattern and correctly registers as an eager singleton, which is appropriate for a cache service.

conf/webknossos.latest.routes (1)

113-113: LGTM! Well-structured route definition.

The new route follows REST conventions and integrates properly with existing datastore routes. Parameter types correctly match the controller method signature.
app/controllers/WKRemoteDataStoreController.scala (1)

262-271: Verify access control approach for dataset retrieval.

The method uses GlobalAccessContext to retrieve datasets, which bypasses user-specific access controls. Confirm this is intentional for datastore-to-datastore communication.

If this endpoint should respect dataset permissions, consider using an appropriate access context:
 def getDataset(name: String, key: String, datasetId: ObjectId): Action[AnyContent] =
   Action.async { implicit request =>
     dataStoreService.validateAccess(name, key) { _ =>
       for {
-        dataset <- datasetDAO.findOne(datasetId)(GlobalAccessContext)
+        // Consider if dataset-level access control is needed here
+        dataset <- datasetDAO.findOne(datasetId)(GlobalAccessContext)
         dataSource <- datasetService.fullDataSourceFor(dataset)
       } yield Ok(Json.toJson(dataSource))
     }
   }
Verify with the team whether datastore authentication is sufficient or if additional dataset-level authorization is required.
app/models/dataset/WKRemoteDataStoreClient.scala (2)

20-20: LGTM!

The import addition is necessary for the new updateDatasetInDSCache method and follows proper import organization.

119-125: LGTM!

The new cache update method is well-implemented:

Follows the established pattern used by other methods in the class

Uses appropriate authentication with webknossosToken

The URL path aligns with the new dataset ID-based routing mentioned in the PR objectives

Proper return type Fox[Unit] consistent with other cache operations

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2)

15-15: LGTM!

The expanded import correctly adds the necessary types for the new getDataset method while maintaining clean import organization.

196-199: LGTM!

The new dataset retrieval method is well-implemented:

Follows established patterns used throughout the class

Uses proper authentication with dataStoreKey

The endpoint URL aligns with the new dataset ID-based API design

Appropriate return type Fox[GenericDataSource[DataLayer]] for dataset operations

Includes meaningful error message for debugging

webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)

43-43: LGTM: Clean dependency injection addition.

The DatasetCache dependency is properly injected and will be used by the new updateCache method.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala (2)

19-20: LGTM: Useful utility method with correct implementation.

The isEmpty method properly checks all attachment collections and options, providing a clean way to determine if attachments are present.

27-27: LGTM: Consistent naming improvement.

The renaming from LayerAttachmentDataformat to LayerAttachmentDataFormat improves consistency with proper capitalization conventions. All references have been updated correctly throughout the file.

Also applies to: 37-37, 44-44, 64-64, 72-72, 80-80, 88-88, 96-96

app/controllers/UserTokenController.scala (2)

99-101: LGTM: Proper delegation to dataset access handler.

The pattern match correctly delegates dataset access requests to the new handleDataSetAccess method, using the directoryName as the dataset ID parameter.

164-190: LGTM: Well-structured dataset access control implementation.

The method follows the established pattern for access control:

Properly converts string ID to ObjectId

Implements separate read and write access logic

Uses appropriate database contexts for access validation

Handles only supported access modes (read/write)

The implementation is consistent with the existing handleDataSourceAccess method.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AccessTokenService.scala (2)

23-23: LGTM: Logical extension of access resource types.

Adding dataset to the enumeration properly extends the access control framework to support dataset-level access validation.

42-43: LGTM: Consistent factory method implementation.

The readDataset and writeDataset factory methods follow the established pattern:

Use the datasetId as the directoryName in DataSourceId

Empty organizationId (consistent with other ID-based access requests)

Proper access modes for the respective operations

This maintains consistency with existing access request patterns.

Also applies to: 48-49

webknossos-datastore/conf/com.scalableminds.webknossos.datastore.routes (1)

15-20: Well-structured dataset-centric routes

The new /wkDatasets/ routes provide a clean API for accessing datasets by ID. The distinct prefix prevents ambiguity with existing organization-based routes, and the consistent structure with existing endpoints makes the API intuitive.

Also applies to: 127-127

webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/WKDatasetController.scala (6)

53-73: LGTM! Well-structured endpoint with proper access control and monitoring.

The method correctly validates access, handles errors gracefully, and includes performance monitoring for slow requests.

103-112: Clean implementation using request object.

This POST variant properly uses a request object pattern, which is cleaner than the GET variant with many parameters.

114-164: Robust thumbnail generation with good error handling.

The implementation includes proper fallback handling for empty data (lines 155-157) and comprehensive parameter validation. The JPEG generation pipeline is well-structured.

166-175: LGTM!

Simple and effective implementation for finding data positions.

177-188: Well-implemented with informative error messages.

The method provides clear error messages for both missing data sources and histogram generation failures.

190-198: LGTM!

Clean helper method that properly transforms requests and delegates to the binary data service.

app/models/dataset/Dataset.scala (3)

868-873: LGTM! Clear implementation with helpful context.

The method correctly retrieves all magnifications and the comment about WKW datasets provides important context for maintainers.

886-956: Well-integrated attachment support for dataset layers.

The implementation correctly extends layer parsing to include attachments, data format, and channel count. Error messages are descriptive and will help with debugging.

1092-1118: LGTM! Clean implementation of attachment updates.

The method properly handles clearing old attachments before inserting new ones within a transaction, ensuring data consistency.

frontend/javascripts/viewer/model/bucket_data_handling/wkstore_adapter.ts

frontend/javascripts/admin/rest_api.ts

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DatasetCache.scala

coderabbitai · 2025-06-04T12:41:21Z

app/models/dataset/DatasetService.scala

+  private def findLayersForDatasetWithMags(datasetId: ObjectId): Fox[List[DataLayer]] =
+    for {
+      layers <- datasetDataLayerDAO.findAllForDataset(datasetId)
+      _ <- Fox.fromBool(!layers.flatMap(_.dataFormatOpt).contains(DataFormat.wkw)) ?~> "WKW data format not supported in this context, only datasets with MagLocators are supported"
+      layerNamesAndMags <- datasetMagsDAO.findAllByDatasetId(datasetId)
+      layersWithMags = layers.map { layer =>
+        val mags = layerNamesAndMags.filter(_._1 == layer.name).map(_._2).toList
+        layer match {
+          case AbstractDataLayer(name,
+                                 category,
+                                 boundingBox,
+                                 _,
+                                 elementClass,
+                                 defaultViewConfiguration,
+                                 adminViewConfiguration,
+                                 coordinateTransformations,
+                                 additionalAxes,
+                                 attachmentsOpt,
+                                 _,
+                                 numChannels,
+                                 dataFormat,
+                                 _) =>
+            dataFormat match {
+              case Some(df) =>
+                df match {
+                  case DataFormat.wkw =>
+                    throw new NotImplementedError(
+                      "WKW data format not supported in this context, only datasets with MagLocators are supported")
+                  case DataFormat.neuroglancerPrecomputed =>
+                    PrecomputedDataLayer(
+                      name,
+                      boundingBox,
+                      category,
+                      elementClass,
+                      mags,
+                      defaultViewConfiguration,
+                      adminViewConfiguration,
+                      coordinateTransformations,
+                      numChannels,
+                      additionalAxes,
+                      attachmentsOpt
+                    )
+                  case DataFormat.n5 =>
+                    N5DataLayer(
+                      name,
+                      category,
+                      boundingBox,
+                      elementClass,
+                      mags,
+                      defaultViewConfiguration,
+                      adminViewConfiguration,
+                      coordinateTransformations,
+                      numChannels,
+                      additionalAxes,
+                      attachmentsOpt
+                    )
+                  case DataFormat.zarr =>
+                    ZarrDataLayer(
+                      name,
+                      category,
+                      boundingBox,
+                      elementClass,
+                      mags,
+                      defaultViewConfiguration,
+                      adminViewConfiguration,
+                      coordinateTransformations,
+                      numChannels,
+                      additionalAxes,
+                      attachmentsOpt,
+                      df
+                    )
+                  case DataFormat.zarr3 =>
+                    Zarr3DataLayer(
+                      name,
+                      category,
+                      boundingBox,
+                      elementClass,
+                      mags,
+                      defaultViewConfiguration,
+                      adminViewConfiguration,
+                      coordinateTransformations,
+                      numChannels,
+                      additionalAxes,
+                      attachmentsOpt,
+                    )
+                }
+              case None => ???
+            }
+          case AbstractSegmentationLayer(name,
+                                         _,
+                                         boundingBox,
+                                         _,
+                                         elementClass,
+                                         largestSegmentId,
+                                         mappings,
+                                         defaultViewConfiguration,
+                                         adminViewConfiguration,
+                                         coordinateTransformations,
+                                         additionalAxes,
+                                         attachmentsOpt,
+                                         _,
+                                         numChannels,
+                                         dataFormat,
+                                         _) =>
+            dataFormat match {
+              case Some(df) =>
+                df match {
+                  case DataFormat.wkw =>
+                    throw new NotImplementedError(
+                      "WKW data format not supported in this context, only datasets with MagLocators are supported")
+                  case DataFormat.neuroglancerPrecomputed =>
+                    PrecomputedSegmentationLayer(
+                      name,
+                      boundingBox,
+                      elementClass,
+                      mags,
+                      largestSegmentId,
+                      mappings,
+                      defaultViewConfiguration,
+                      adminViewConfiguration,
+                      coordinateTransformations,
+                      numChannels,
+                      additionalAxes,
+                      attachmentsOpt
+                    )
+                  case DataFormat.n5 =>
+                    N5SegmentationLayer(
+                      name,
+                      boundingBox,
+                      elementClass,
+                      mags,
+                      largestSegmentId,
+                      mappings,
+                      defaultViewConfiguration,
+                      adminViewConfiguration,
+                      coordinateTransformations,
+                      numChannels,
+                      additionalAxes,
+                      attachmentsOpt
+                    )
+                  case DataFormat.zarr =>
+                    ZarrSegmentationLayer(
+                      name,
+                      boundingBox,
+                      elementClass,
+                      mags,
+                      largestSegmentId,
+                      mappings,
+                      defaultViewConfiguration,
+                      adminViewConfiguration,
+                      coordinateTransformations,
+                      numChannels,
+                      additionalAxes,
+                      attachmentsOpt,
+                      df
+                    )
+                  case DataFormat.zarr3 =>
+                    Zarr3SegmentationLayer(
+                      name,
+                      boundingBox,
+                      elementClass,
+                      mags,
+                      largestSegmentId,
+                      mappings,
+                      defaultViewConfiguration,
+                      adminViewConfiguration,
+                      coordinateTransformations,
+                      numChannels,
+                      additionalAxes,
+                      attachmentsOpt
+                    )
+                }
+              case None => ???
+            }
+          case _ => throw new NotImplementedError("DataLayer type mismatch (unreachable)")
+        }
+      }
+    } yield layersWithMags


🛠️ Refactor suggestion

Refactor duplicated pattern matching logic

This method contains extensive code duplication between the AbstractDataLayer and AbstractSegmentationLayer cases. Each data format is handled almost identically for both layer types.

Consider extracting the common logic into helper methods:

private def createDataLayerForFormat( dataFormat: DataFormat, name: String, boundingBox: BoundingBox, elementClass: ElementClass, mags: List[MagInfo], // ... other common parameters ): DataLayer = dataFormat match { case DataFormat.neuroglancerPrecomputed => PrecomputedDataLayer(name, boundingBox, ...) case DataFormat.n5 => N5DataLayer(name, ...) // ... other formats } private def createSegmentationLayerForFormat( dataFormat: DataFormat, // ... parameters ): DataLayer = dataFormat match { // Similar pattern for segmentation layers }

🤖 Prompt for AI Agents

In app/models/dataset/DatasetService.scala between lines 306 and 483, the pattern matching logic for handling data formats in AbstractDataLayer and AbstractSegmentationLayer cases is duplicated. To fix this, extract the common logic for creating data layers and segmentation layers into separate helper methods that take the data format and relevant parameters, then return the appropriate DataLayer instance. Replace the duplicated match blocks with calls to these helper methods to improve code maintainability and reduce redundancy.

coderabbitai · 2025-06-04T12:41:21Z

app/models/dataset/DatasetService.scala

+  private def findLayersForDatasetWithMags(datasetId: ObjectId): Fox[List[DataLayer]] =
+    for {
+      layers <- datasetDataLayerDAO.findAllForDataset(datasetId)
+      _ <- Fox.fromBool(!layers.flatMap(_.dataFormatOpt).contains(DataFormat.wkw)) ?~> "WKW data format not supported in this context, only datasets with MagLocators are supported"


⚠️ Potential issue

Inconsistent error handling for WKW format

The method uses Fox.fromBool to check for WKW format at line 309, but then throws NotImplementedError in the pattern matching. This creates two different error paths for the same condition.

Either rely on the initial check:

-case DataFormat.wkw => - throw new NotImplementedError( - "WKW data format not supported in this context, only datasets with MagLocators are supported") +case DataFormat.wkw => + // This case should be unreachable due to the check at line 309 + Fox.failure("Internal error: WKW format should have been filtered out")

Or remove the initial check and handle it consistently in the pattern matching.

Also applies to: 331-333, 413-415

🤖 Prompt for AI Agents

In app/models/dataset/DatasetService.scala at line 309 and similarly at lines 331-333 and 413-415, the error handling for the WKW data format is inconsistent because it uses Fox.fromBool for an initial check but then throws NotImplementedError later. To fix this, choose one consistent approach: either keep the Fox.fromBool check and remove the NotImplementedError throws in the pattern matching, or remove the Fox.fromBool check and handle the error uniformly with NotImplementedError in the pattern matching. Apply the chosen approach consistently across all mentioned lines.

app/models/dataset/DatasetService.scala

app/models/dataset/Dataset.scala

fm3

Thanks for delving into this, good stuff!

I left a couple of comments, mostly on caching.

Testing did not go all that well for me yet, maybe I’m doing some things wrong. I switched the dataset id in the ts files for one that exists in my db and tried to open a dataset, but I only get black.

In the backend logging I see a lot of failed RPCs with 403 "DataStore not found." as well as some failed PUT requests with {"obj.scale":[{"msg":["error.path.missing"],"args":[]}],"obj.dataLayers":[{"msg":["error.path.missing"],"args":[]}]}

If everything works for you, maybe we could debug on my machine together on monday.

I agree that we should attempt to minimize code duplication between the implementation of the datastore routes by id and those by organizationId+datasetDirectoryName. I hope we can get rid of the old access pattern at some point, but that may take several months.

conf/webknossos.latest.routes

app/controllers/WKRemoteDataStoreController.scala

app/models/dataset/Dataset.scala

app/models/dataset/DatasetService.scala

fm3 · 2025-06-05T08:13:32Z

app/models/dataset/DatasetService.scala

+              case Some(df) =>
+                df match {
+                  case DataFormat.wkw =>
+                    throw new NotImplementedError(


please don’t mix exceptions with box/fox error handling. If it is too complex to return Box in the match cascades, please at least wrap it with tryo so the exceptions are ultimately turned into boxes so they can bubble up properly.

fm3 · 2025-06-05T08:17:37Z

...ossos-datastore/app/com/scalableminds/webknossos/datastore/services/AccessTokenService.scala

 object AccessResourceType extends ExtendedEnumeration {
  type AccessResourceType = Value
-  val datasource, tracing, annotation, webknossos, jobExport = Value
+  val datasource, dataset, tracing, annotation, webknossos, jobExport = Value


hmm again naming things is hard 😬 I don’t really have a clean solution either. I hope we can just get rid of the traditional/datasource codepaths soon.

fm3 · 2025-06-05T08:21:35Z

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DatasetCache.scala

+                             @Named("webknossos-datastore") val actorSystem: ActorSystem)(implicit ec: ExecutionContext)
+    extends FoxImplicits {
+
+  val cache = new TemporaryStore[ObjectId, DataSource](actorSystem)


Implementing this with a TemporaryStore and explicit get and insert solves this. However, if many requests pour in that trigger this codepath, they will all be cache misses at first, so a lot of requests to the remote webknossos will be caused.

Instead, how about using the AlfuCache pattern, which is actually asynchronous and caches the Futures?

This would, however, require a few changes. The AlfuCache Pattern is getOrLoad, so there is no explicit insert.

But I think this could work. Instead of explicit inserts, you could just clear the existing entries to invalidate them. Then the next getOrLoad will re-fill the cache with the updated version. What do you think?

fm3 · 2025-06-05T12:45:40Z

I talked to norman again and yes, we want to eventually only use the byId routes in the datastore. So the next steps (possibly in a follow-up PR) would be to duplicate all relevant datastore routes, and implement this for all dataset types, as well as switch the frontend to use only the new routes.

coderabbitai

Actionable comments posted: 2

🔭 Outside diff range comments (1)

app/models/dataset/Dataset.scala (1)

890-905: 🛠️ Refactor suggestion

N+1 query pattern when fetching attachments

Inside parseRow we call datasetLayerAttachmentsDAO.findAll… for every layer.
For datasets with many layers this spawns one extra DB round-trip per layer.

Consider bulk-loading all attachments once and grouping them in memory, or extending the original layer query with a LEFT JOIN to fetch attachments in a single shot.

♻️ Duplicate comments (2)

app/models/dataset/DatasetService.scala (2)

309-336: 🛠️ Refactor suggestion

Inconsistent WKW-format handling persists

Past review already flagged that we both (a) pre-filter WKW via Fox.fromBool and (b) still throw a NotImplementedError in the pattern-match.
Having two independent guards for the same condition is brittle and violates the “single source of truth” principle.

Either drop the Fox.fromBool guard or replace the NotImplementedError with a Fox.failure, but use exactly one style consistently.

315-392: 🛠️ Refactor suggestion

Large duplication between colour & segmentation logic

The two pattern-matches are ~100 lines each and differ only in the concrete layer constructor. This was pointed out earlier and is still unresolved.

Please extract a helper that maps (df, layerType) → DataLayer to eliminate duplication and ease future maintenance.

Also applies to: 448-478

🧹 Nitpick comments (3)

app/models/dataset/DatasetService.scala (1)

488-493: Network failure from datastore notification is unchecked

updateDatasetInDSCache performs an HTTP request but its failure is ignored.
Consider wrapping the call in Fox.tryo and bubbling up / logging the error so that silent cache desynchronisation does not occur.
app/models/dataset/Dataset.scala (2)
904-905: Minor: redundant lambda
row.dataformat.flatMap(df => DataFormat.fromString(df))
can be shortened to
row.dataformat.flatMap(DataFormat.fromString)
1066-1083: parseAttachments repeatedly filters the same list

Each attachment type filters rows anew, producing five passes over the same collection.

A quick micro-refactor:
- meshFiles         <- Fox.serialCombined(rows.filter(_.`type` == ...))(parseRow)
+ val grouped = rows.groupBy(_.`type`)
+ meshFiles         <- Fox.serialCombined(grouped.getOrElse(LayerAttachmentType.mesh.toString, Nil))(parseRow)
reduces complexity with no behaviour change.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e64fd7b and b8ebc50.

📒 Files selected for processing (10)

app/controllers/WKRemoteDataStoreController.scala (1 hunks)
app/models/dataset/Dataset.scala (10 hunks)
app/models/dataset/DatasetService.scala (5 hunks)
app/models/dataset/WKRemoteDataStoreClient.scala (2 hunks)
conf/webknossos.latest.routes (1 hunks)
frontend/javascripts/admin/rest_api.ts (1 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (4 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DatasetCache.scala (1 hunks)
webknossos-datastore/conf/com.scalableminds.webknossos.datastore.routes (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (8)

conf/webknossos.latest.routes
app/controllers/WKRemoteDataStoreController.scala
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
app/models/dataset/WKRemoteDataStoreClient.scala
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DatasetCache.scala
frontend/javascripts/admin/rest_api.ts
webknossos-datastore/conf/com.scalableminds.webknossos.datastore.routes

⏰ Context from checks skipped due to timeout of 90000ms (2)

GitHub Check: build-smoketest-push
GitHub Check: backend-tests

app/models/dataset/DatasetService.scala

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

app/models/dataset/DatasetService.scala (2)

307-488: Large format-switch duplication persists

The lengthy match cascades for data/segmentation layers replicate the logic
earlier flagged as duplicate. Extracting factory helpers (as already proposed
in a previous review) would remove this redundancy and simplify future
maintenance.

310-336: Redundant WKW checks

A top-level Fox.fromBool already rejects DataFormat.wkw; keeping an inner
case DataFormat.wkw => throw … branch is superfluous and adds noise.

Also applies to: 415-418

🧹 Nitpick comments (2)

app/models/dataset/DatasetService.scala (2)
293-306: fullDataSourceFor largely copies dataSourceFor – DRY it up

Both methods are identical except for the layer-retrieval call. Extract the
common part into a small helper:
private def buildSource(dataset: Dataset,
                        layersF: => Fox[List[DataLayer]]): Fox[InboxDataSource] = {
  for {
    layers <- layersF
    id      = DataSourceId(dataset.directoryName, dataset._organization)
    scale  <- dataset.voxelSize.toFox ?~> "dataset.source.usableButNoScale"
  } yield GenericDataSource[DataLayer](id, layers, scale)
}

def dataSourceFor(dataset: Dataset)  = buildSource(dataset,
  datasetDataLayerDAO.findAllForDataset(dataset._id))

def fullDataSourceFor(dataset: Dataset) = buildSource(dataset,
  findLayersForDatasetWithMags(dataset._id))
Keeps behaviour identical while cutting duplication.

394-395: Return a Fox.failure instead of throwing

Throwing IllegalStateException here is caught by tryo, but causes verbose
stack-traces and loses the chance to provide an i18n key. Prefer:
case None =>
  Fox.failure("dataset.layer.missingDataFormat")
for clearer error semantics and cleaner logs.

Also applies to: 480-482

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b8ebc50 and bdc75cc.

📒 Files selected for processing (2)

app/models/dataset/DatasetService.scala (5 hunks)
app/models/dataset/WKRemoteDataStoreClient.scala (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

app/models/dataset/WKRemoteDataStoreClient.scala

🧰 Additional context used

🧠 Learnings (1)

app/models/dataset/DatasetService.scala (1)

Learnt from: frcroth
PR: scalableminds/webknossos#8657
File: app/models/dataset/DatasetService.scala:380-392
Timestamp: 2025-06-16T09:04:40.565Z
Learning: Trailing commas are supported in Scala starting from version 2.12.2 as part of SIP-27. They are valid in function calls, function definitions, import statements, and other contexts. This means trailing commas after the last parameter or argument do not cause compilation errors in Scala 2.12.2+.

⏰ Context from checks skipped due to timeout of 90000ms (2)

GitHub Check: build-smoketest-push
GitHub Check: backend-tests

coderabbitai · 2025-06-16T09:30:17Z

app/models/dataset/DatasetService.scala

+        _ <- notifyDatastoreOnUpdate(foundDataset._id)
      } yield foundDataset._id


💡 Verification agent

🧩 Analysis chain

Remote cache invalidation can make the entire update fail

notifyDatastoreOnUpdate is executed inside the transactional flow of
updateDataSource*. A transient network error when calling the remote
datastore would roll the whole DB update back, although only the cache could
not be refreshed.

Consider decoupling this call (e.g. run it in a Future/fork or dispatch a
message that can be retried) so the primary write succeeds even if the
datastore is unreachable, and log failures for later inspection.

Also applies to: 238-239

I’ll examine how notifyDatastoreOnUpdate is invoked relative to the transaction boundaries. Retrieving the updateDataSource* definitions, their transaction usage, and the call sites:

🏁 Script executed:

#!/bin/bash set -e # Locate updateDataSource methods echo "=== updateDataSource definitions ===" rg -n "def updateDataSource" app/models/dataset/DatasetService.scala # Show surrounding lines of these methods echo -e "\n=== Context around updateDataSource ===" rg -n -C5 "updateDataSource" app/models/dataset/DatasetService.scala # Locate notifyDatastoreOnUpdate calls echo -e "\n=== notifyDatastoreOnUpdate usages ===" rg -n -C3 "notifyDatastoreOnUpdate" app/models/dataset/DatasetService.scala # Check for transaction boundaries (e.g., .transactionally or similar) echo -e "\n=== Transaction markers ===" rg -n "transactionally" app/models/dataset/DatasetService.scala || echo "No explicit .transactionally found"

Length of output: 6057

Decouple notifyDatastoreOnUpdate from the transactional flow

The call to notifyDatastoreOnUpdate(foundDataset._id) is currently inside the same for-comprehension (and thus the same transactional/monadic flow) as your database updates. Any transient failure contacting the remote datastore will cause the entire DB update to roll back. To ensure the primary write always succeeds:

• Move the cache-invalidation call outside the DB transaction or wrap it in a detached Future/background job.
• Log and retry failures independently of the DB update.

Locations to update:

app/models/dataset/DatasetService.scala: lines 210–217

app/models/dataset/DatasetService.scala: lines 232–239

🤖 Prompt for AI Agents

In app/models/dataset/DatasetService.scala around lines 210 to 217 and 232 to 239, the call to notifyDatastoreOnUpdate is currently inside the transactional flow, causing the entire database update to roll back if the remote cache invalidation fails. To fix this, move the notifyDatastoreOnUpdate call outside the transaction or run it asynchronously in a detached Future or background job, ensuring the primary database update always succeeds. Additionally, add logging and retry mechanisms for failures in cache invalidation to handle transient errors without affecting the main transaction.

coderabbitai

Actionable comments posted: 3

🔭 Outside diff range comments (1)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/annotation/AnnotationSource.scala (1)
6-13: New mandatory datasetId will break deserialization of existing documents

datasetId is introduced as a required ObjectId.
All persisted AnnotationSource JSON documents that were written before this PR (or any client payloads not yet updated) lack this field, so:

Json.parse(..).as[AnnotationSource] will now throw JsResultException.

Play will return 400 for incoming requests that omit the property.

Unless a full data-migration + coordinated client rollout is guaranteed, make the field optional or supply a custom Reads with a sensible fallback.
-                            datasetId: ObjectId,
+                            datasetId: Option[ObjectId],
and later:
implicit val jsonFormat: OFormat[AnnotationSource] =
  Json.format[AnnotationSource]   // Option[...] keeps backward compat
If the field must be mandatory, add a DB migration script and bump API version so that older clients fail fast and predictably.
Also double-check an implicit OFormat[ObjectId] is in scope; otherwise compilation will fail here.

🧹 Nitpick comments (2)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/WKZarrStreamingController.scala (2)

50-53: Re-evaluate blanket CORS enablement

allowRemoteOrigin = true exposes every endpoint to any origin.
Given that these routes hand out arbitrary binary data, you probably want a narrower CORS policy (e.g. configured origins, credentials flag, etc.) instead of a global wildcard.

320-344: Private-link helpers silently fall back to datastore route

zArrayPrivateLink / zarrJsonPrivateLink default to datastore data when the layer isn’t found in the annotation source.
That makes debugging access mistakes tricky.

Return 404 (or at least log a warning) when the caller explicitly asked for an annotation layer that does not exist instead of silently changing data source.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bdc75cc and 2196b2f.

📒 Files selected for processing (5)

app/models/annotation/AnnotationService.scala (1 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/WKZarrStreamingController.scala (1 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/annotation/AnnotationSource.scala (1 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DatasetCache.scala (1 hunks)
webknossos-datastore/conf/com.scalableminds.webknossos.datastore.routes (4 hunks)

✅ Files skipped from review due to trivial changes (1)

app/models/annotation/AnnotationService.scala

🚧 Files skipped from review as they are similar to previous changes (2)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DatasetCache.scala
webknossos-datastore/conf/com.scalableminds.webknossos.datastore.routes

🧰 Additional context used

🧬 Code Graph Analysis (1)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/WKZarrStreamingController.scala (15)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AccessTokenService.scala (4)

webknossos (66-68)

validateAccessFromTokenContext (83-90)

readDataset (42-45)

validateAccessFromTokenContextForSyncBlock (77-83)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/zarr/ZarrCoordinatesParser.scala (2)

ZarrCoordinatesParser (11-47)

parseNDimensionalDotCoordinates (13-46)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/Positions.scala (2)

VoxelPosition (8-44)

topLeft (70-76)

frontend/javascripts/types/api_types.ts (1)

AnnotationLayerType (411-411)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/annotation/AnnotationSource.scala (3)

AnnotationSource (6-15)

AnnotationSource (17-19)

getAnnotationLayer (14-14)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/requests/Cuboid.scala (1)

Cuboid (9-60)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/requests/DataServiceRequests.scala (1)

DataServiceDataRequest (18-30)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DatasetCache.scala (2)

getWithLayer (25-29)

getById (22-23)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteTracingstoreClient.scala (7)

getVolumeLayerAsZarrLayer (40-48)

getRawZarrCube (59-63)

getZArray (31-35)

getZarrJson (35-40)

getDataLayerMagDirectoryContents (63-68)

getDataLayerDirectoryContents (68-73)

getZGroup (73-76)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)

getAnnotationSource (172-182)

util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)

tokenContextForRequest (226-228)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/annotation/AnnotationLayer.scala (1)

typ (76-79)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (1)

handleDataRequests (159-178)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr/ZarrHeader.scala (1)

fromLayer (59-85)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Zarr3ArrayHeader.scala (1)

fromDataLayer (262-295)

⏰ Context from checks skipped due to timeout of 90000ms (2)

GitHub Check: backend-tests
GitHub Check: build-smoketest-push

coderabbitai · 2025-06-16T13:39:16Z

...store/app/com/scalableminds/webknossos/datastore/controllers/WKZarrStreamingController.scala

+  private def convertLayerToZarrLayer(layer: DataLayer, zarrVersion: Int): ZarrLayer = {
+    val dataFormat = if (zarrVersion == 2) DataFormat.zarr else DataFormat.zarr3
+    layer match {
+      case s: SegmentationLayer =>
+        val rank = s.additionalAxes.map(_.length).getOrElse(0) + 4 // We’re writing c, additionalAxes, xyz
+        ZarrSegmentationLayer(
+          s.name,
+          s.boundingBox,
+          s.elementClass,
+          mags = s.sortedMags.map(x => MagLocator(x, None, None, Some(AxisOrder.cAdditionalxyz(rank)), None, None)),
+          mappings = s.mappings,
+          largestSegmentId = s.largestSegmentId,
+          numChannels = Some(if (s.elementClass == ElementClass.uint24) 3 else 1),
+          defaultViewConfiguration = s.defaultViewConfiguration,
+          adminViewConfiguration = s.adminViewConfiguration,
+          coordinateTransformations = s.coordinateTransformations,
+          additionalAxes = s.additionalAxes.map(reorderAdditionalAxes),
+          dataFormat = dataFormat
+        )
+      case d: DataLayer =>
+        val rank = d.additionalAxes.map(_.length).getOrElse(0) + 4 // We’re writing c, additionalAxes, xyz
+        ZarrDataLayer(
+          d.name,
+          d.category,
+          d.boundingBox,
+          d.elementClass,
+          mags = d.sortedMags.map(x => MagLocator(x, None, None, Some(AxisOrder.cAdditionalxyz(rank)), None, None)),
+          numChannels = Some(if (d.elementClass == ElementClass.uint24) 3 else 1),
+          defaultViewConfiguration = d.defaultViewConfiguration,
+          adminViewConfiguration = d.adminViewConfiguration,
+          coordinateTransformations = d.coordinateTransformations,
+          additionalAxes = d.additionalAxes.map(reorderAdditionalAxes),
+          dataFormat = dataFormat
+        )
+    }
+  }


🛠️ Refactor suggestion

Duplicate layer-conversion code – centralise and test

convertLayerToZarrLayer is another copy of logic already living in the original controller.
Apart from maintenance concerns, the two versions must stay byte-identical to avoid subtle divergences (e.g. rank calculation, element-class channel logic).

Move this transformation into a single helper (maybe inside ZarrLayerFactory) and unit-test it once.

🤖 Prompt for AI Agents

In webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/WKZarrStreamingController.scala between lines 154 and 189, the convertLayerToZarrLayer method duplicates logic present elsewhere, risking inconsistencies. Refactor by extracting this conversion logic into a single helper method, ideally within a dedicated object like ZarrLayerFactory, and replace all existing copies with calls to this helper. Add unit tests for this helper to ensure correctness and maintainability.

coderabbitai · 2025-06-16T13:39:16Z

...store/app/com/scalableminds/webknossos/datastore/controllers/WKZarrStreamingController.scala

+/**
+  *
+  * This is a copy of the ZarrStreamingController using datasetId instead of organizationId/datasetDirectoryName
+  */
+class WKZarrStreamingController @Inject()(


🛠️ Refactor suggestion

Avoid large-scale copy-paste – extract shared logic

The whole controller is a near verbatim copy of ZarrStreamingController with only the lookup key changed.
This will double maintenance effort and risks the two controllers diverging silently.

Consider extracting the common logic into reusable services / mix-ins and keep the two tiny façade controllers (or, once migration is finished, just one controller).
This also makes upcoming “duplicate all routes” work trivial.

🤖 Prompt for AI Agents

In webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/WKZarrStreamingController.scala around lines 34 to 38, the WKZarrStreamingController is almost an exact copy of ZarrStreamingController except for the lookup key. To fix this, refactor by extracting the shared logic into a common service or trait that both controllers can use, then simplify each controller to just handle the specific lookup key differences. This reduces duplication, eases maintenance, and prevents divergence between the two controllers.

coderabbitai · 2025-06-16T13:39:16Z

...store/app/com/scalableminds/webknossos/datastore/controllers/WKZarrStreamingController.scala

+  private def rawZarrCube(
+      datasetId: String,
+      dataLayerName: String,
+      mag: String,
+      coordinates: String,
+  )(implicit m: MessagesProvider, tc: TokenContext): Fox[Result] =
+    for {
+      datasetIdValidated <- ObjectId.fromString(datasetId)
+      (dataSource, dataLayer) <- datasetCache.getWithLayer(datasetIdValidated, dataLayerName) ~> SERVICE_UNAVAILABLE
+      reorderedAdditionalAxes = dataLayer.additionalAxes.map(reorderAdditionalAxes)
+      (x, y, z, additionalCoordinates) <- ZarrCoordinatesParser.parseNDimensionalDotCoordinates(
+        coordinates,
+        reorderedAdditionalAxes) ?~> "zarr.invalidChunkCoordinates" ~> NOT_FOUND
+      magParsed <- Vec3Int
+        .fromMagLiteral(mag, allowScalar = true)
+        .toFox ?~> Messages("dataLayer.invalidMag", mag) ~> NOT_FOUND
+      _ <- Fox.fromBool(dataLayer.containsMag(magParsed)) ?~> Messages("dataLayer.wrongMag", dataLayerName, mag) ~> NOT_FOUND
+      cubeSize = DataLayer.bucketLength
+      request = DataServiceDataRequest(
+        Some(dataSource.id),
+        dataLayer,
+        Cuboid(
+          topLeft = VoxelPosition(x * cubeSize * magParsed.x,
+                                  y * cubeSize * magParsed.y,
+                                  z * cubeSize * magParsed.z,
+                                  magParsed),
+          width = cubeSize,
+          height = cubeSize,
+          depth = cubeSize
+        ),
+        DataServiceRequestSettings(halfByte = false, additionalCoordinates = additionalCoordinates)
+      )
+      (data, notFoundIndices) <- binaryDataService.handleDataRequests(List(request))
+      _ <- Fox.fromBool(notFoundIndices.isEmpty) ~> "zarr.chunkNotFound" ~> NOT_FOUND
+    } yield Ok(data)
+


⚠️ Potential issue

Binary cube response lacks content type & may exhaust memory

rawZarrCube streams the full chunk into a byte array and returns Ok(data) without setting Content-Type.
Two issues:

Clients cannot rely on MIME type – send application/octet-stream.

Holding the whole chunk (~134 MB for 512³ uint8) in memory before sending can blow the heap under load.
Use Play’s Chunked or HttpEntity.Streamed with an Akka Source to stream from binaryDataService.

Example patch:

- } yield Ok(data) + } yield + Ok.sendEntity( + HttpEntity.Strict(ByteString(data), Some("application/octet-stream")) + )

(Or switch to a streamed entity.)

🤖 Prompt for AI Agents

In webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/WKZarrStreamingController.scala around lines 241 to 276, the rawZarrCube method returns the entire binary chunk as a byte array with Ok(data) but does not set the Content-Type header, and it loads the entire chunk into memory which risks heap exhaustion. Fix this by setting the response Content-Type to "application/octet-stream" and refactor the method to stream the data using Play's Chunked or HttpEntity.Streamed with an Akka Source instead of loading it fully into memory before sending.

frcroth · 2025-06-16T15:16:17Z

@fm3 I changed it to use an AlfuCache and duplicated the zarr routes as discussed. The other routes are all about files so not relevant to this change (remote datasets).

I also removed the frontend changes. If you want to test them again, see this commit here: f62a7d7#diff-1833767de35826d9954d2ef7d077f41f8b69861833801e63b9dd77e47d4c6d27

fm3

Nice, I think with the AlfuCache this makes a lot of sense :) Thanks for implementing the smaller comments as well.

I’m a little unsure about the new code duplication, see comment above.

fm3 · 2025-06-18T08:16:17Z

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DatasetCache.scala

+  lazy val cache: AlfuCache[ObjectId, DataSource] = AlfuCache[ObjectId, DataSource](
+    timeToLive = 1 day // Cache for a longer time, since we invalidate the cache manually
+  )


we might want to increase the capacity here, default is 1k. How about 5k? DataSource objects should not use up all that much memory.

fm3 · 2025-06-18T08:21:06Z

...store/app/com/scalableminds/webknossos/datastore/controllers/WKZarrStreamingController.scala

+    extends Controller
+    with Zarr3OutputHelper {
+
+  override def defaultErrorCode: Int = NOT_FOUND


Whew, that’s really a lot of duplication. @normanrz what do you think, how long after switching to the by-id routes internally do we need to support both variants to facilitate libs compatibility? If it is long, it may be worth taking the time now to extract the duplicated functionality into separate services, or possibly inject the ZarrStreamingController itself into this WkZarrStreamingController and move the shared functionality into functions accessible by both.

Or maybe we could go the other way round, and have the by-orgaId-and-datasetName routes do a lookup of the id, and then call the methods of the new controllers? If both lookups are cached on the datastore side, there should not be a significant performance hit for that, right? ok no I guess that will work only once all dataset types are accessible in this way, not just remote datasets. So maybe for the next iteration

Load data via dataset ID

11bcc9c

frcroth added 6 commits May 26, 2025 17:56

Update dataset cache when datasource is updated

a964308

Merge branch 'master' into virtual-remote-datasets

9d3ad67

Include attachments in WK generated datasource

63ddf7b

In-Memory Cache

b9d05f3

Move byId routes to new controller

bec8480

remove todo

04bfd3b

frcroth mentioned this pull request Jun 4, 2025

Attachments in datasource-properties.json #8567

Open

8 tasks

Check early for WKW

e64fd7b

frcroth requested a review from fm3 June 4, 2025 12:34

frcroth marked this pull request as ready for review June 4, 2025 12:34

coderabbitai bot reviewed Jun 4, 2025

View reviewed changes

fm3 requested changes Jun 5, 2025

View reviewed changes

fm3 assigned frcroth Jun 5, 2025

fm3 mentioned this pull request Jun 5, 2025

Use ObjectId in TracingStore where applicable #8674

Merged

3 tasks

frcroth added 3 commits June 16, 2025 09:23

Merge branch 'master' into virtual-remote-datasets

04922de

Fix minor suggestions from code review

eb2c11b

Use AlfuCache

b8ebc50

coderabbitai bot reviewed Jun 16, 2025

View reviewed changes

app/models/dataset/DatasetService.scala Show resolved Hide resolved

app/models/dataset/DatasetService.scala Outdated Show resolved Hide resolved

Make exception more explicit

bdc75cc

coderabbitai bot reviewed Jun 16, 2025

View reviewed changes

frcroth added 3 commits June 16, 2025 15:31

Copy zarr routes for dataset id usage

f62a7d7

Lint

ce4d02d

Merge branch 'master' into virtual-remote-datasets

2196b2f

coderabbitai bot reviewed Jun 16, 2025

View reviewed changes

frcroth requested a review from fm3 June 16, 2025 15:08

fm3 reviewed Jun 18, 2025

View reviewed changes

frcroth and others added 5 commits June 23, 2025 15:26

Merge branch 'master' into virtual-remote-datasets

02a0f0c

Remove wk zarr routes

3a4b603

Merge branch 'master' into virtual-remote-datasets

cbe2e62

Rename method

ead213c

Increase cache capacity

16dfb39

frcroth requested a review from fm3 June 23, 2025 13:38

fm3 approved these changes Jun 24, 2025

View reviewed changes

frcroth and others added 2 commits June 25, 2025 08:55

Merge branch 'master' into virtual-remote-datasets

5e388f8

Remove unused actor system

89f0eb8

frcroth merged commit 6096fe4 into master Jun 25, 2025
5 checks passed

frcroth deleted the virtual-remote-datasets branch June 25, 2025 07:26

coderabbitai bot mentioned this pull request Jul 21, 2025

Virtual Datasets #8708

Merged

22 tasks

coderabbitai bot mentioned this pull request Sep 12, 2025

Calculate s3 storage #8789

Merged

7 tasks

coderabbitai bot mentioned this pull request Sep 22, 2025

Upload datasets to S3 #8912

Merged

18 tasks

		_ <- notifyDatastoreOnUpdate(foundDataset._id)
		} yield foundDataset._id

Virtual Remote Datasets #8657

Virtual Remote Datasets #8657

Uh oh!

Conversation

frcroth commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

URL of deployed dev instance (used for testing):

Steps to test:

TODOs:

Issues:

Uh oh!

coderabbitai bot commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Suggested labels

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

fm3 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fm3 Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

fm3 Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

fm3 Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

fm3 commented Jun 5, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

frcroth commented May 26, 2025 •

edited

Loading

coderabbitai bot commented May 26, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

fm3 Jun 19, 2025 •

edited

Loading