Skip to content

Conversation

frcroth
Copy link
Contributor

@frcroth frcroth commented Aug 11, 2025

Creation of virtual remote datasets still involves the datastore, because it does the exploring (to handle local file paths).

Steps to test:

  • Create a virtual remote dataset
  • Delete a virtual remote dataset
  • Delete a non-virtual dataset

Issues:


  • Needs datastore update after deployment

Copy link
Contributor

coderabbitai bot commented Aug 11, 2025

📝 Walkthrough

Walkthrough

Adds WK-side endpoints and payloads to create virtual datasets and request on-disk deletion by dataset ID; removes datastore-facing registration and virtual-deletion endpoints; moves frontend datastore-host/token plumbing into WK API calls; adds path validation RPC and datastore local-path parsing change; updates service/client signatures and wiring.

Changes

Cohort / File(s) Summary
WK backend: Dataset controller & routes
app/controllers/DatasetController.scala, conf/webknossos.latest.routes
Adds DataSourceRegistrationInfo payload, addVirtualDataset(name: String) POST and deleteOnDisk(datasetId: ObjectId) DELETE endpoints; extends DatasetController constructor with many dependencies; routes updated.
WK backend: Dataset service & remote client
app/models/dataset/DatasetService.scala, app/models/dataset/WKRemoteDataStoreClient.scala
Adds validatePaths(paths,dataStore), deleteVirtualOrDiskDataset(dataset); updates createVirtualDataset signature (drops organizationId); adds WKRemoteDataStoreClient.validatePaths and deleteOnDisk.
WK backend: WKRemoteDataStoreController & routes
app/controllers/WKRemoteDataStoreController.scala, conf/webknossos.latest.routes
Removes deleteVirtualDataset action and its route; adds deleteDataset action (datastore-callback style); updates createVirtualDataset callsite to new signature.
Frontend: admin APIs & dataset add view
frontend/javascripts/admin/rest_api.ts, frontend/javascripts/admin/dataset/dataset_add_remote_view.tsx
Replaces datastore-host/token flows with WK API: storeRemoteDataset now posts {dataStoreName,dataSource,folderId} to /api/datasets/addVirtualDataset/:name; deleteDatasetOnDisk now calls /api/datasets/:id/deleteOnDisk.
Frontend: dashboard deletion flows
frontend/javascripts/dashboard/advanced_dataset/dataset_action_view.tsx, frontend/javascripts/dashboard/dataset/dataset_settings_delete_tab.tsx
Updated calls to deleteDatasetOnDisk to pass only dataset.id (removed datastore URL argument).
Datastore backend: DataSource controller, DS client & routes
webknossos-datastore/app/.../DataSourceController.scala, webknossos-datastore/app/.../DSRemoteWebknossosClient.scala, webknossos-datastore/conf/datastore.latest.routes
Removes frontend-facing DataSourceController.add and DS client registerDataSource / deleteVirtualDataset; adds validatePaths endpoint and PathValidationResult type; updated delete-on-disk behavior to call deleteDataSource for non-disk datasets; routes updated.
Datastore: RemoteSourceDescriptorService behavior
webknossos-datastore/app/.../RemoteSourceDescriptorService.scala
Changes path parsing to use Path.of(new URI(pathLiteral).getPath), altering local-path detection semantics for URI-like inputs.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~35 minutes

Assessment against linked issues

Objective Addressed Explanation
Remove datastore from frontend↔WK communication where not necessary (#8814)
Ensure frontend triggers WK deletion and WK asks datastore only for non-virtual datasets (#8814)
Do not register virtual datasets at datastore (#8814)
Remove datasource repository / have checkInbox report to WK directly (#8814) No changes to datasource repository or checkInbox logic in provided diffs.

Assessment against linked issues: Out-of-scope changes

Code Change Explanation
Setting DataSource id to DataSourceId(datasetName, user._organization) in createVirtualDataset (app/models/dataset/DatasetService.scala) Internal id overwrite is an internal representation change not required by #8814 and not mentioned in the issue objectives.
Change path parsing to Path.of(new URI(pathLiteral).getPath) in RemoteSourceDescriptorService.pathIsDataSourceLocal (webknossos-datastore/.../RemoteSourceDescriptorService.scala) Alters URI/local-path detection semantics; this is a behavior change to validation logic unrelated to removing datastore from frontend↔WK communication.

Possibly related PRs

Suggested reviewers

  • frcroth
  • normanrz

Poem

I nibble routes and hop through code,
I stitch a POST, I mark a node,
Virtual burrows now registered here,
Delete-on-disk calls the datastore near.
A rabbit cheers the merge — hop, hooray! 🥕🐇

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch clean-up-ds-wk-communication-virtual-datasets

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@frcroth frcroth marked this pull request as ready for review August 11, 2025 16:24
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🔭 Outside diff range comments (2)
app/controllers/WKRemoteDataStoreController.scala (2)

241-244: Bug: async side-effect is dropped; deletion not awaited

The Fox effect from deleteDataset(...).flatMap(...) is assigned to a val and not sequenced. The DB delete and storage refresh may never run.

Apply:

-                _ = datasetDAO
-                  .deleteDataset(dataset._id, onlyMarkAsDeleted = annotationCount > 0)
-                  .flatMap(_ => usedStorageService.refreshStorageReportForDataset(dataset))
+                _ <- datasetDAO
+                  .deleteDataset(dataset._id, onlyMarkAsDeleted = annotationCount > 0)
+                  .flatMap(_ => usedStorageService.refreshStorageReportForDataset(dataset))

293-320: Remove dead registerDataSource endpoint from WKRemoteDataStoreController

This action is no longer exposed in any conf/*.routes file and isn’t referenced elsewhere in the codebase. It, along with its dependency on DataSourceRegistrationInfo, can be safely removed.

Please:

  • Delete the entire registerDataSource method (lines 293–320) in
    app/controllers/WKRemoteDataStoreController.scala.
  • Remove the import or usage of DataSourceRegistrationInfo in that file.
  • Verify that DataSourceRegistrationInfo remains in DatasetController.scala for the still‐used addVirtualDataset action.
🧹 Nitpick comments (4)
app/models/dataset/WKRemoteDataStoreClient.scala (1)

108-114: New RPC deleteOnDisk looks correct

Path, token and method align with datastore’s DataSourceController.deleteOnDisk.

Consider aligning parameter types with updateDatasetInDSCache (ObjectId vs String) for consistency.

app/models/dataset/DatasetService.scala (1)

671-679: DB-only deletion path: add a TODO for cloud storage cleanup

Comment notes freeing S3 space “once implemented”. Consider tracking with a TODO and linking an issue to ensure follow-up.

app/controllers/DatasetController.scala (1)

74-78: DTO placement and typing look fine; minor reuse consideration

DataSourceRegistrationInfo is clear and the JSON format is correct. Consider moving this DTO to a shared dto package if it will be reused by other controllers to avoid controller bloat.

frontend/javascripts/admin/rest_api.ts (1)

1265-1283: Optional: add small telemetry context for failure analytics

If you want better diagnostics, pass a requestProperties object to sendFailedRequestAnalyticsEvent in your saga/caller when these calls fail, including datasetName/datasetId and dataStoreName.

Also applies to: 1348-1352

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f4595a8 and 526825e.

📒 Files selected for processing (12)
  • app/controllers/DatasetController.scala (4 hunks)
  • app/controllers/WKRemoteDataStoreController.scala (2 hunks)
  • app/models/dataset/DatasetService.scala (4 hunks)
  • app/models/dataset/WKRemoteDataStoreClient.scala (1 hunks)
  • conf/webknossos.latest.routes (2 hunks)
  • frontend/javascripts/admin/dataset/dataset_add_remote_view.tsx (1 hunks)
  • frontend/javascripts/admin/rest_api.ts (2 hunks)
  • frontend/javascripts/dashboard/advanced_dataset/dataset_action_view.tsx (1 hunks)
  • frontend/javascripts/dashboard/dataset/dataset_settings_delete_tab.tsx (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (0 hunks)
  • webknossos-datastore/conf/datastore.latest.routes (0 hunks)
💤 Files with no reviewable changes (2)
  • webknossos-datastore/conf/datastore.latest.routes
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala
🧰 Additional context used
🧬 Code Graph Analysis (8)
frontend/javascripts/dashboard/advanced_dataset/dataset_action_view.tsx (1)
frontend/javascripts/admin/rest_api.ts (1)
  • deleteDatasetOnDisk (1348-1352)
app/models/dataset/WKRemoteDataStoreClient.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)
  • deleteOnDisk (430-452)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/rpc/RPCRequest.scala (1)
  • addQueryString (28-31)
app/controllers/UserTokenController.scala (1)
  • RpcTokenHolder (30-38)
frontend/javascripts/dashboard/dataset/dataset_settings_delete_tab.tsx (1)
frontend/javascripts/admin/rest_api.ts (1)
  • deleteDatasetOnDisk (1348-1352)
app/models/dataset/DatasetService.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (3)
  • DataSourceId (10-12)
  • DataSourceId (14-31)
  • allExplicitPaths (68-68)
app/models/dataset/Dataset.scala (1)
  • deleteDataset (698-726)
app/controllers/DatasetController.scala (8)
util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)
  • validateJson (204-209)
app/models/dataset/DataStore.scala (1)
  • findOneByName (113-120)
app/controllers/UserController.scala (1)
  • user (48-56)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (4)
  • Fox (30-223)
  • Fox (225-289)
  • fromBool (32-36)
  • assertTrue (187-193)
app/models/dataset/DatasetService.scala (3)
  • createVirtualDataset (104-127)
  • isEditableBy (584-600)
  • deleteVirtualOrDiskDataset (661-682)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)
  • deleteOnDisk (33-80)
app/models/dataset/Dataset.scala (1)
  • findOne (207-214)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (1)
  • toString (11-11)
app/controllers/WKRemoteDataStoreController.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (4)
  • DataSourcePathInfo (41-41)
  • DataSourcePathInfo (43-45)
  • DataStoreStatus (31-31)
  • DataStoreStatus (32-34)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AccessTokenService.scala (1)
  • webknossos (73-75)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (2)
  • GenericDataSource (38-69)
  • GenericDataSource (71-76)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (5)
  • Fox (30-223)
  • Fox (225-289)
  • successful (53-56)
  • s (229-233)
  • s (233-243)
frontend/javascripts/admin/dataset/dataset_add_remote_view.tsx (1)
frontend/javascripts/admin/rest_api.ts (1)
  • storeRemoteDataset (1265-1283)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: frontend-tests
  • GitHub Check: backend-tests
  • GitHub Check: build-smoketest-push
🔇 Additional comments (8)
app/controllers/WKRemoteDataStoreController.scala (1)

229-231: Doc comment addition: good clarification

Clearer contract that this endpoint is called post on-disk deletion by the datastore.

frontend/javascripts/dashboard/advanced_dataset/dataset_action_view.tsx (1)

166-167: Frontend call updated to new API

deleteDatasetOnDisk now correctly takes only the datasetId. UI and cache invalidation flow remain coherent.

frontend/javascripts/dashboard/dataset/dataset_settings_delete_tab.tsx (1)

33-35: Frontend call updated to new API

Switched to deleteDatasetOnDisk(dataset.id). Matches updated REST helper.

frontend/javascripts/admin/dataset/dataset_add_remote_view.tsx (1)

149-155: Virtual dataset creation path aligned with backend

  • Using datastore name instead of URL.
  • Passing parsed dataSource JSON object.
  • Payload matches rest_api.ts StoreRemoteDatasetArgs.

Looks good.

conf/webknossos.latest.routes (1)

82-83: Authorization checks validated

  • app/controllers/DatasetController.scala#520–528: deleteOnDisk enforces
    request.identity._organization == dataset._organization && request.identity.isAdmin
  • app/controllers/DatasetController.scala#169–173: addVirtualDataset allows only
    team managers, admins, or dataset managers via userService.isTeamManagerOrAdminOfOrg and user.isDatasetManager

No changes needed.

app/models/dataset/DatasetService.scala (1)

119-121: Correctly overriding DataSourceId for virtual datasets

Setting id = DataSourceId(datasetName, organizationId) ensures directoryName consistency for new virtual datasets.

frontend/javascripts/admin/rest_api.ts (2)

1348-1352: LGTM: route switch to WK for deletion

Deletion now targets WK directly, as intended by the PR objective. No token/host required. Looks good.


1259-1283: All storeRemoteDataset and deleteDatasetOnDisk call sites compile with the tightened types

Call site summary:
• frontend/javascripts/admin/dataset/dataset_add_remote_view.tsx (line 149)
storeRemoteDataset(datastoreToUse.name, datasetName, dataSourceJson, targetFolderId)

• frontend/javascripts/dashboard/dataset/dataset_settings_delete_tab.tsx (line 34)
deleteDatasetOnDisk(dataset.id)

• frontend/javascripts/dashboard/advanced_dataset/dataset_action_view.tsx (line 166)
deleteDatasetOnDisk(dataset.id)

No further changes required.

If a dataset is virtual, we must not expect any data on the datastore in that datasetDirectoryName directory. If local layers are used, they will link to other local layers (so we do not need to delete anything), or some other data, where we are not responsible
@frcroth frcroth requested a review from fm3 August 11, 2025 16:57
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)

445-449: Else-branch now notifies WK when dataset is missing on disk — consider adding error mapping and cache invalidation

Good fix: calling dsRemoteWebknossosClient.deleteDataSource(dataSourceId) here addresses the stale-DB risk when files were removed manually. Two small improvements:

  • Map failures to a clear message, like elsewhere: ?~> "dataset.delete.webknossos.failed".
  • Invalidate the DS-side datasetCache to avoid stale reads immediately after deletion.

Suggested diff within this hunk:

-              _ <- dsRemoteWebknossosClient.deleteDataSource(dataSourceId)
-              _ = logger.warn(s"Tried to delete dataset ${dataSource.id} that is not on disk.")
+              _ <- dsRemoteWebknossosClient.deleteDataSource(dataSourceId) ?~> "dataset.delete.webknossos.failed"
+              _ = datasetCache.invalidateCache(datasetId)
+              _ = logger.warn(s"Tried to delete dataset ${dataSource.id} that is not on disk.")

Optional: Mirror the same cache invalidation in the on-disk branch after the WK callback.

To ensure there are no lingering uses of the removed virtual-deletion path, run:

#!/bin/bash
# Look for old calls that might have used deleteVirtualDataset or similar
rg -n --pcre2 'deleteVirtualDataset|registerDataSource\(|registerDatasource\('
frontend/javascripts/admin/rest_api.ts (1)

1259-1263: Typed payload and WK endpoint transition for virtual datasets — looks good

  • Using APIDataSource in StoreRemoteDatasetArgs and in the function signature removes the loose any and aligns with server expectations.
  • Posting to /api/datasets/addVirtualDataset/${datasetName} with { dataStoreName, dataSource, folderId? } matches the new API.

Minor: Optional to set dataStoreName directly as dataStoreName without the redundant key-value re-labeling.

-    dataStoreName: dataStoreName,
+    dataStoreName,

Otherwise LGTM.

Also applies to: 1266-1269, 1271-1277, 1279-1282

🧹 Nitpick comments (3)
app/models/dataset/DatasetService.scala (2)

661-674: Unified deletion entry point looks good; consider follow-ups for UX

The method cleanly separates virtual vs non-virtual:

  • Virtual: mark as deleted in DB.
  • Non-virtual: delegate on-disk deletion via the datastore client.

Two optional improvements for UX consistency:

  • After successful virtual deletion, consider clearing related thumbnails/caches (e.g., via the controller using thumbnailCachingService.removeFromCache(dataset._id)), to avoid stale UI.
  • Add logging around the branch taken to aid ops visibility.

No blocker.


46-46: Remove unused import

java.net.URI isn’t used in this file anymore. Recommend removing to avoid unused-import warnings.

-import java.net.URI
app/controllers/DatasetController.scala (1)

521-530: Deletion endpoint enforces org-admin and editability — consider clearing thumbnail cache post-deletion

Checks are solid:

  • Global feature flag gate.
  • Editability via isEditableBy.
  • Org-admin via isAdminOf(dataset._organization).

Optional: after successful deletion, clear thumbnails to avoid stale UI:

-      } yield Ok
+      } yield Ok

And in the block before yielding Ok:

_ <- Fox.runIf(dataset.isVirtual)(thumbnailCachingService.removeFromCache(dataset._id))

Or always clear, if cache presence isn’t conditional.

No blocker.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 526825e and bc7305d.

📒 Files selected for processing (5)
  • app/controllers/DatasetController.scala (4 hunks)
  • app/models/dataset/DatasetService.scala (4 hunks)
  • frontend/javascripts/admin/rest_api.ts (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (0 hunks)
💤 Files with no reviewable changes (1)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-12T13:07:29.637Z
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.

Applied to files:

  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
🧬 Code Graph Analysis (4)
app/models/dataset/DatasetService.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (4)
  • AbstractDataLayer (523-537)
  • AbstractDataLayer (539-559)
  • from (541-558)
  • from (581-600)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (2)
  • DataSourceId (10-12)
  • DataSourceId (14-31)
app/models/dataset/WKRemoteDataStoreClient.scala (1)
  • deleteOnDisk (108-115)
frontend/javascripts/admin/rest_api.ts (1)
frontend/javascripts/types/api_types.ts (1)
  • APIDataSource (151-151)
app/controllers/DatasetController.scala (8)
util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)
  • validateJson (204-209)
app/models/dataset/DataStore.scala (1)
  • findOneByName (113-120)
app/models/user/UserService.scala (1)
  • isTeamManagerOrAdminOfOrg (321-326)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (4)
  • Fox (30-223)
  • Fox (225-289)
  • fromBool (32-36)
  • assertTrue (187-193)
app/models/dataset/DatasetService.scala (2)
  • createVirtualDataset (104-127)
  • deleteVirtualOrDiskDataset (661-675)
app/models/user/User.scala (3)
  • findOne (152-159)
  • isAdminOf (55-58)
  • isAdminOf (58-61)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (1)
  • toString (11-11)
util/src/main/scala/com/scalableminds/util/time/Instant.scala (1)
  • toString (15-15)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AccessTokenService.scala (2)
  • webknossos (73-75)
  • deleteDataSource (34-36)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
  • deleteDataSource (147-152)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: backend-tests
  • GitHub Check: build-smoketest-push
  • GitHub Check: frontend-tests
🔇 Additional comments (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)

22-22: Import cleanup is fine

Dropping DataSourceId from the import list is safe here; the file only uses the value via dataSource.id. No issues.

app/models/dataset/DatasetService.scala (2)

119-121: Ensure DataSourceId is set for virtual datasets

Setting id = DataSourceId(datasetName, organizationId) while converting dataLayers to AbstractDataLayer is correct and aligns the virtual dataset’s data source identity with the directory name in DB.


159-159: Explicit organization-not-found mapping

Adding ?~> "organization.notFound" improves error semantics for dataset creation. Good.

frontend/javascripts/admin/rest_api.ts (1)

1348-1352: deleteDatasetOnDisk callers updated — no stale multi-arg usages remain

All call sites have been verified with ripgrep and no instances of deleteDatasetOnDisk with multiple arguments or storeRemoteDataset with the old four-arg signature were found. ✅

app/controllers/DatasetController.scala (2)

74-79: Payload type for virtual dataset creation

Defining DataSourceRegistrationInfo with a JSON formatter keeps the API surface explicit and stable. Good choice to embed dataStoreName alongside dataSource and folderId.


162-181: Strong authorization and validation before creating virtual datasets

  • Org-scoped permission check via isTeamManagerOrAdminOfOrg(user, user._organization) OR user.isDatasetManager is appropriate.
  • The non-empty dataLayers guard mirrors the explore flow. Good.

No blocker. Consider trimming name server-side if not already validated in service.

@fm3
Copy link
Member

fm3 commented Aug 12, 2025

Looking good, thanks!

I found one problem, though. The restrictions introduced in #8741 for the add route seem to be missing now. I fear they can’t be easily implemented on the wk side, since localDirectoryWhitelist is not available there. I think we need a way to access it. Otherwise, users can add datasets with arbitrary paths. What do you think? Or maybe we should leave add on the datastore side for the moment so the check can happen there before it is forwarded to the wk side.

@frcroth
Copy link
Contributor Author

frcroth commented Aug 13, 2025

@fm3 I added a validation for paths as a new route to the datastore. What do you think?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
app/controllers/DatasetController.scala (2)

162-182: LGTM! Comprehensive virtual dataset creation flow

The implementation includes all necessary checks:

  • Authorization verification
  • Non-empty data layers validation
  • Path validation through the datastore
  • Virtual dataset creation with proper error handling

522-531: LGTM! Proper permission checks for dataset deletion

The implementation correctly:

  • Checks if deletion is enabled via feature flag
  • Verifies edit permissions
  • Requires organization admin status
  • Delegates to the appropriate deletion method
🧹 Nitpick comments (1)
app/models/dataset/DatasetService.scala (1)

661-670: Consider more specific error messages for path validation failures

The current implementation provides a generic error message for all path validation failures. Consider providing more specific messages to help users understand why a path failed validation.

 def validatePaths(paths: Seq[String], dataStore: DataStore): Fox[Unit] =
   for {
     _ <- Fox.successful(())
     client = new WKRemoteDataStoreClient(dataStore, rpc)
     pathValidationResults <- client.validatePaths(paths)
     _ <- Fox.serialCombined(pathValidationResults)({
       case PathValidationResult(_, true)     => Fox.successful(())
-      case PathValidationResult(path, false) => Fox.failure(s"Path validation failed for path: $path")
+      case PathValidationResult(path, false) => 
+        // Provide more context about why validation failed
+        if (path.startsWith("http://") || path.startsWith("https://")) {
+          Fox.failure(s"Remote path requires credentials: $path")
+        } else if (path.contains("..")) {
+          Fox.failure(s"Path contains directory traversal: $path")
+        } else {
+          Fox.failure(s"Path not in whitelist or dataset directory: $path")
+        }
     })
   } yield ()
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bc7305d and 65bb80d.

📒 Files selected for processing (6)
  • app/controllers/DatasetController.scala (4 hunks)
  • app/models/dataset/DatasetService.scala (4 hunks)
  • app/models/dataset/WKRemoteDataStoreClient.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (6 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RemoteSourceDescriptorService.scala (1 hunks)
  • webknossos-datastore/conf/datastore.latest.routes (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • webknossos-datastore/conf/datastore.latest.routes
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-12T13:07:29.637Z
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.

Applied to files:

  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
🧬 Code Graph Analysis (3)
app/models/dataset/DatasetService.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (4)
  • PathValidationResult (47-50)
  • PathValidationResult (52-54)
  • validatePaths (686-699)
  • deleteOnDisk (440-464)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (4)
  • AbstractDataLayer (523-537)
  • AbstractDataLayer (539-559)
  • from (541-558)
  • from (581-600)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (2)
  • DataSourceId (10-12)
  • DataSourceId (14-31)
app/controllers/DatasetController.scala (6)
app/models/dataset/DataStore.scala (1)
  • findOneByName (113-120)
app/models/user/UserService.scala (1)
  • isTeamManagerOrAdminOfOrg (321-326)
app/models/dataset/WKRemoteDataStoreClient.scala (2)
  • validatePaths (102-107)
  • deleteOnDisk (114-121)
app/models/dataset/DatasetService.scala (3)
  • validatePaths (661-672)
  • createVirtualDataset (104-127)
  • deleteVirtualOrDiskDataset (672-686)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (2)
  • allExplicitPaths (68-68)
  • toString (11-11)
app/models/user/User.scala (3)
  • findOne (152-159)
  • isAdminOf (55-58)
  • isAdminOf (58-61)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (2)
  • DataVaultService (21-30)
  • DataVaultService (32-69)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RemoteSourceDescriptorService.scala (2)
  • RemoteSourceDescriptorService (20-177)
  • pathIsAllowedToAddDirectly (151-157)
util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)
  • validateJson (204-209)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: backend-tests
  • GitHub Check: build-smoketest-push
  • GitHub Check: frontend-tests
🔇 Additional comments (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (2)

456-459: LGTM! Good handling of the edge case

The updated logic correctly handles the case where a dataset is not found on disk by still notifying webknossos to mark it as deleted in the database, maintaining consistency.


686-697: Whitelist enforcement verified for local paths
The validatePaths endpoint correctly delegates to remoteSourceDescriptorService.pathIsAllowedToAddDirectly, which in turn checks against DataStoreConfig.Datastore.localDirectoryWhitelist. No further changes are necessary.

RemoteSourceDescriptorService.pathIsAllowedToAddDirectly uses dataStoreConfig.Datastore.localDirectoryWhitelist to allow or reject absolute paths
ExploreRemoteLayerService applies the same whitelist check for incoming URIs
DataStoreConfig loads localDirectoryWhitelist from datastore.localDirectoryWhitelist in the configuration

app/models/dataset/WKRemoteDataStoreClient.scala (2)

102-105: LGTM! Clean implementation of path validation endpoint

The implementation correctly uses RPC with the webknossos token for authentication.


114-119: LGTM! Well-structured deleteOnDisk implementation

The implementation follows the established pattern and properly handles authentication.

app/models/dataset/DatasetService.scala (1)

672-684: LGTM! Clean separation of virtual vs disk datasets

The implementation correctly handles both virtual and non-virtual datasets, with appropriate cleanup for each type.

app/controllers/DatasetController.scala (1)

74-78: LGTM! Well-structured data model

The DataSourceRegistrationInfo case class and its JSON formatter are properly defined.

private def pathIsDataSourceLocal(pathLiteral: String): Boolean =
pathIsLocal(pathLiteral) && {
val path = Path.of(pathLiteral)
val path = Path.of(new URI(pathLiteral).getPath)
Copy link
Contributor

@coderabbitai coderabbitai bot Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

⚠️ Potential issue: URI handling change may alter path validation behavior

The change from Path.of(pathLiteral) to Path.of(new URI(pathLiteral).getPath) introduces two concerns:

  1. URISyntaxException risk: If pathLiteral is not a valid URI (e.g., contains spaces or special characters), this will throw an exception
  2. Behavior change: URI paths starting with "/" become absolute, causing the validation logic to fail even for valid local paths

This affects the critical pathIsAllowedToAddDirectly method used for path validation.

Apply this fix to handle URI parsing safely and preserve the original semantics:

 private def pathIsDataSourceLocal(pathLiteral: String): Boolean =
   pathIsLocal(pathLiteral) && {
-    val path = Path.of(new URI(pathLiteral).getPath)
+    val path = try {
+      val uri = new URI(pathLiteral)
+      if (uri.getScheme == null || uri.getScheme == DataVaultService.schemeFile) {
+        // For local paths, use the literal directly to preserve relative paths
+        Path.of(pathLiteral)
+      } else {
+        Path.of(uri.getPath)
+      }
+    } catch {
+      case _: Exception => Path.of(pathLiteral) // Fallback to literal for non-URI paths
+    }
     val workingDir = Path.of(".").toAbsolutePath.normalize
     val inWorkingDir = workingDir.resolve(path).toAbsolutePath.normalize
     !path.isAbsolute && inWorkingDir.startsWith(workingDir)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
val path = Path.of(new URI(pathLiteral).getPath)
private def pathIsDataSourceLocal(pathLiteral: String): Boolean =
pathIsLocal(pathLiteral) && {
val path = try {
val uri = new URI(pathLiteral)
if (uri.getScheme == null || uri.getScheme == DataVaultService.schemeFile) {
// For local paths, use the literal directly to preserve relative paths
Path.of(pathLiteral)
} else {
Path.of(uri.getPath)
}
} catch {
case _: Exception =>
// Fallback to literal for non-URI or malformed-URI inputs
Path.of(pathLiteral)
}
val workingDir = Path.of(".").toAbsolutePath.normalize
val inWorkingDir = workingDir.resolve(path).toAbsolutePath.normalize
!path.isAbsolute && inWorkingDir.startsWith(workingDir)
}
🤖 Prompt for AI Agents
In
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RemoteSourceDescriptorService.scala
around line 164, the change to Path.of(new URI(pathLiteral).getPath) can throw
URISyntaxException and can change semantics by turning inputs into absolute
paths; instead, attempt to parse pathLiteral as a URI inside a try/catch, and
only use URI.getPath when the parsed URI has a non-null scheme (or is clearly a
file:// style URI); if parsing fails or the URI has no scheme, fall back to
Path.of(pathLiteral) so existing local-path validation behavior is preserved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For pathLiteral: "file://home/felix/scm/webknossos/binaryData/sample_organization/l4_sample/color", with the new code:
path = /felix/scm/webknossos/binaryData/sample_organization/l4_sample/color
workingDir = /home/felix/scm/webknossos
inWorkingDir = /felix/scm/webknossos/binaryData/sample_organization/l4_sample/color
inWorkingDir.startsWith(workingDir) = false
path.isAbsolute = true
--> Path gets rejected correctly as an absolute path

Old code:
path = file:/home/felix/scm/webknossos/binaryData/sample_organization/l4_sample/color
workingDir = /home/felix/scm/webknossos
inWorkingDir = /home/felix/scm/webknossos/file:/home/felix/scm/webknossos/binaryData/sample_organization/l4_sample/color
inWorkingDir.startsWith(workingDir) = false
path.isAbsolute = false
--> Path gets wrongly accepted even though it should be absoute path, because the file:// scheme is not handled.

In workingDir does not seem to work correctly in both cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@frcroth Thank you for the detailed analysis! Your example actually demonstrates that the new code is working correctly and fixes a security issue:

The change is actually a bug fix:

  • Old code bug: file:// URIs were incorrectly treated as relative paths because Path.of("file://...") doesn't parse URI schemes properly
  • New code fix: Correctly extracts the path component (/felix/scm/...) and properly identifies it as absolute, leading to correct rejection

Your example shows the new code is behaving as intended - absolute paths should be rejected for security reasons.

However, there's still the potential exception risk I mentioned. Here's a safer approach that preserves the fix while handling edge cases:

private def pathIsDataSourceLocal(pathLiteral: String): Boolean =
  pathIsLocal(pathLiteral) && {
    val path = try {
      val uri = new URI(pathLiteral)
      if (uri.getScheme != null) {
        // For URIs with schemes (like file://), extract the path component
        Path.of(uri.getPath)
      } else {
        // For plain paths without schemes, use as-is
        Path.of(pathLiteral)
      }
    } catch {
      case _: URISyntaxException => Path.of(pathLiteral) // Fallback for malformed URIs
    }
    val workingDir = Path.of(".").toAbsolutePath.normalize
    val inWorkingDir = workingDir.resolve(path).toAbsolutePath.normalize
    !path.isAbsolute && inWorkingDir.startsWith(workingDir)
  }

Regarding the inWorkingDir logic issue you mentioned - that might be a separate concern worth investigating if the working directory resolution isn't behaving as expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For relative paths (e.g. pathLiteral "./color"), new URI("./color").getPath is ./color, so we do not change the behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@frcroth You're absolutely correct! Thank you for testing that specific case.

For relative paths like "./color":

  • new URI("./color").getPath returns "./color"
  • Path.of("./color").isAbsolute returns false
  • The validation logic remains unchanged for relative paths

This confirms that your change is a precise fix that:

  1. ✅ Preserves existing behavior for relative paths
  2. ✅ Correctly handles URI schemes (like file://) by extracting the path component
  3. ✅ Fixes the security issue where file:// URIs were incorrectly treated as relative

The change is working exactly as intended - no additional complexity needed. My earlier suggestion was overly cautious given that relative paths work correctly with the new implementation.

🐰
∩___∩
( ◕ ◕ )
\ ___ /
| |
_____/

Copy link
Contributor Author

@frcroth frcroth Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, the weird handling of inWorkingDir was caused by using two instead of three slashes (file:///), which made the path weird. With the actually correct URI, the inWorkingDir check works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@frcroth That makes perfect sense! You're absolutely right - using file:// (two slashes) instead of the proper file:/// (three slashes) would definitely cause weird URI parsing behavior. The malformed URI format would lead to unexpected path extraction, which explains the strange inWorkingDir resolution you observed.

With properly formatted file URIs, your code change works exactly as intended:

  • ✅ Correctly handles relative paths like "./color"
  • ✅ Properly extracts path components from valid URIs like file:///home/...
  • ✅ Fixes the security issue where malformed file:// URIs were incorrectly treated as relative paths

The implementation is solid! 🎯

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
app/controllers/DatasetController.scala (3)

74-78: Use ObjectId for folderId in DTO for type-safety and consistency

Everywhere else in this controller, folderId is modeled as ObjectId. Keeping the same type here avoids accidental invalid IDs slipping through and keeps JSON formats consistent (there’s already an implicit Format[ObjectId] in scope in this codebase). If the frontend sends a string, Play JSON can still parse it into ObjectId via the implicit.

Apply this diff:

-case class DataSourceRegistrationInfo(dataSource: DataSource, folderId: Option[String], dataStoreName: String)
+case class DataSourceRegistrationInfo(dataSource: DataSource, folderId: Option[ObjectId], dataStoreName: String)

Follow-ups:

  • Ensure datasetService.createVirtualDataset expects Option[ObjectId] (or convert at the boundary).
  • Adjust the frontend to continue sending the folderId as a string; Play will parse it into ObjectId.

162-182: Solid: permissions + layer guard + path validation. Two small improvements suggested

The flow looks correct: permission gate, non-empty layers check, datastore path validation, then create. Two minor suggestions:

  • Trim dataStoreName before lookup to avoid subtle mismatches.
  • Preserve detailed path validation errors (or map to a 403) instead of overriding with a generic message, so the UI can surface which path failed. If you prefer i18n-only messages, at least attach a 403 to reflect an authorization-like failure.

Apply this diff to trim the dataStoreName:

-        dataStore <- dataStoreDAO.findOneByName(request.body.dataStoreName) ?~> Messages(
+        dataStore <- dataStoreDAO.findOneByName(request.body.dataStoreName.trim) ?~> Messages(
           "datastore.notFound",
           request.body.dataStoreName) ~> NOT_FOUND

Option A (preserve detailed error): remove the custom message mapping so the specific failure from validatePaths bubbles up.

-        _ <- datasetService.validatePaths(request.body.dataSource.allExplicitPaths, dataStore) ?~> "dataSource.add.pathsNotAllowed"
+        _ <- datasetService.validatePaths(request.body.dataSource.allExplicitPaths, dataStore)

Option B (keep i18n but map to 403):

-        _ <- datasetService.validatePaths(request.body.dataSource.allExplicitPaths, dataStore) ?~> "dataSource.add.pathsNotAllowed"
+        _ <- datasetService.validatePaths(request.body.dataSource.allExplicitPaths, dataStore) ?~> "dataSource.add.pathsNotAllowed" ~> FORBIDDEN

522-531: Admin check fix looks good; map “delete disabled” to 403

Great job switching to isAdminOf(dataset._organization) and keeping editability enforcement. One small improvement: when deletion is globally disabled, return FORBIDDEN to reflect a permission-like failure.

Apply this diff:

-        _ <- Fox.fromBool(conf.Features.allowDeleteDatasets) ?~> "dataset.delete.disabled"
+        _ <- Fox.fromBool(conf.Features.allowDeleteDatasets) ?~> "dataset.delete.disabled" ~> FORBIDDEN
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 65bb80d and 9950600.

📒 Files selected for processing (1)
  • app/controllers/DatasetController.scala (4 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
app/controllers/DatasetController.scala (7)
util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)
  • validateJson (204-209)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (3)
  • Fox (30-223)
  • Fox (225-289)
  • fromBool (32-36)
app/models/dataset/DatasetService.scala (2)
  • validatePaths (661-672)
  • deleteVirtualOrDiskDataset (672-686)
app/models/dataset/WKRemoteDataStoreClient.scala (2)
  • validatePaths (102-107)
  • deleteOnDisk (114-121)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (2)
  • allExplicitPaths (68-68)
  • toString (11-11)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)
  • deleteOnDisk (33-80)
app/models/user/User.scala (2)
  • isAdminOf (55-58)
  • isAdminOf (58-61)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: frontend-tests
  • GitHub Check: build-smoketest-push
  • GitHub Check: backend-tests
🔇 Additional comments (2)
app/controllers/DatasetController.scala (2)

10-10: LGTM: Correct import of DataSource/ElementClass

Importing DataSource and ElementClass here is appropriate and used below.


162-182: Signatures verified—no mismatches detected

The createVirtualDataset method takes an Option[String] for folderId, which aligns with DataSourceRegistrationInfo.folderId: Option[String].
The deleteOnDisk method on WKRemoteDataStoreClient accepts an ObjectId and is invoked with dataset._id: ObjectId.

No changes are required.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (2)
app/controllers/WKRemoteDataStoreController.scala (2)

236-244: Blocking bug: delete not executed due to val assignment instead of monadic bind

Inside the for-comprehension, the delete+refresh chain is assigned (_ = ...) rather than bound (_ <- ...). As written, the Fox action is constructed but never executed, so the dataset won’t be marked deleted nor will storage be refreshed.

Fix by binding the effect:

-                _ = datasetDAO
-                  .deleteDataset(dataset._id, onlyMarkAsDeleted = annotationCount > 0)
-                  .flatMap(_ => usedStorageService.refreshStorageReportForDataset(dataset))
+                _ <- datasetDAO
+                  .deleteDataset(dataset._id, onlyMarkAsDeleted = annotationCount > 0)
+                  .flatMap(_ => usedStorageService.refreshStorageReportForDataset(dataset))

303-316: Enforce organization restriction for organization-scoped datastores

For consistency with reserveDatasetUpload and to avoid bypassing datastore organization scoping, add a check against dataStore.onlyAllowedOrganization before creating a virtual dataset via registerDataSource.

         for {
           user <- bearerTokenService.userForToken(token)
           organization <- organizationDAO.findOne(organizationId)(GlobalAccessContext) ?~> Messages(
             "organization.notFound",
             organizationId) ~> NOT_FOUND
           _ <- Fox.fromBool(organization._id == user._organization) ?~> "notAllowed" ~> FORBIDDEN
+          _ <- Fox.fromBool(dataStore.onlyAllowedOrganization.forall(_ == organization._id)) ?~> "dataset.upload.Datastore.restricted" ~> FORBIDDEN
           dataset <- datasetService.createVirtualDataset(
             directoryName,
             dataStore,
             request.body.dataSource,
             request.body.folderId,
             user
           )
🧹 Nitpick comments (3)
app/models/dataset/DatasetService.scala (2)

660-670: Nit: remove no-op Fox step and tighten validation folding

The _ <- Fox.successful(()) line is redundant. Also, you can simplify the serial check with a fold if you prefer. Minimal change below removes the no-op:

   def validatePaths(paths: Seq[String], dataStore: DataStore): Fox[Unit] =
     for {
-      _ <- Fox.successful(())
       client = new WKRemoteDataStoreClient(dataStore, rpc)
       pathValidationResults <- client.validatePaths(paths)
       _ <- Fox.serialCombined(pathValidationResults)({
         case PathValidationResult(_, true)     => Fox.successful(())
         case PathValidationResult(path, false) => Fox.failure(s"Path validation failed for path: $path")
       })
     } yield ()

104-125: Centralize datastore organization scoping (optional)

Currently, the “organization-scoped datastore” check happens in controllers (or not at all, depending on the entry point). Consider guarding createVirtualDataset itself (e.g., with a dataStore.onlyAllowedOrganization.forall(_ == user._organization) check) to prevent accidental bypass from future call sites.

app/controllers/DatasetController.scala (1)

521-526: Optional: map feature flag failure to explicit status

For consistency with other guarded routes, consider mapping dataset.delete.disabled to FORBIDDEN (or 404 if you want to hide capability).

-        _ <- Fox.fromBool(conf.Features.allowDeleteDatasets) ?~> "dataset.delete.disabled"
+        _ <- Fox.fromBool(conf.Features.allowDeleteDatasets) ?~> "dataset.delete.disabled" ~> FORBIDDEN
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9950600 and d028eb8.

📒 Files selected for processing (6)
  • app/controllers/DatasetController.scala (4 hunks)
  • app/controllers/WKRemoteDataStoreController.scala (2 hunks)
  • app/models/dataset/ComposeService.scala (0 hunks)
  • app/models/dataset/DatasetService.scala (4 hunks)
  • app/models/dataset/explore/WKExploreRemoteLayerService.scala (0 hunks)
  • conf/webknossos.latest.routes (2 hunks)
💤 Files with no reviewable changes (2)
  • app/models/dataset/ComposeService.scala
  • app/models/dataset/explore/WKExploreRemoteLayerService.scala
🚧 Files skipped from review as they are similar to previous changes (1)
  • conf/webknossos.latest.routes
🧰 Additional context used
🧬 Code Graph Analysis (3)
app/models/dataset/DatasetService.scala (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (4)
  • PathValidationResult (47-50)
  • PathValidationResult (52-54)
  • validatePaths (686-699)
  • deleteOnDisk (440-464)
app/models/dataset/Dataset.scala (2)
  • doesDatasetDirectoryExistInOrganization (437-453)
  • findOne (207-214)
util/src/main/scala/com/scalableminds/util/accesscontext/AccessContext.scala (1)
  • GlobalAccessContext (25-29)
app/models/organization/Organization.scala (1)
  • findOne (93-102)
app/models/dataset/WKRemoteDataStoreClient.scala (2)
  • validatePaths (102-107)
  • deleteOnDisk (114-121)
app/controllers/DatasetController.scala (5)
util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)
  • validateJson (204-209)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (4)
  • Fox (30-223)
  • Fox (225-298)
  • fromBool (32-36)
  • assertTrue (187-193)
app/models/dataset/DatasetService.scala (2)
  • validatePaths (660-671)
  • deleteVirtualOrDiskDataset (671-685)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (1)
  • allExplicitPaths (68-68)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)
  • deleteOnDisk (33-80)
app/controllers/WKRemoteDataStoreController.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (4)
  • DataSourcePathInfo (41-41)
  • DataSourcePathInfo (43-45)
  • DataStoreStatus (31-31)
  • DataStoreStatus (32-34)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-smoketest-push
  • GitHub Check: backend-tests
  • GitHub Check: frontend-tests
🔇 Additional comments (5)
app/controllers/WKRemoteDataStoreController.scala (1)

229-233: Endpoint semantics: OK; relies on DS callback for non-virtual deletions

The new deleteDataset endpoint correctly models the "on-disk deletion done by DS, finalize in WK" flow and handles the annotation-count-based soft delete. Once the binding bug is fixed, the behavior aligns with the PR objectives.

app/models/dataset/DatasetService.scala (3)

104-125: createVirtualDataset: signature change and usage look correct

Using user._organization instead of passing organizationId is sensible, and uniqueness checks + folder access enforcement are intact. Good use of AbstractDataLayer.from when constructing the abstract data source.


158-159: Improved error message on organization lookup

Switching to ?~> "organization.notFound" makes failure reasons clearer when creating datasets from reported data sources.


671-684: Deletion flow split (virtual vs. on-disk): OK

The new deleteVirtualOrDiskDataset correctly defers on-disk deletion to DS and soft-deletes virtual datasets locally. The centralized implementation matches the PR’s architecture changes.

app/controllers/DatasetController.scala (1)

521-530: Deletion permissions: looks correct now

The org-scoped admin check via isAdminOf(dataset._organization) and the preceding editability assertion align with intended authorization semantics. The unified delete method in DatasetService keeps virtual vs. non-virtual logic in one place.

Comment on lines +162 to +181
def addVirtualDataset(name: String): Action[DataSourceRegistrationInfo] =
sil.SecuredAction.async(validateJson[DataSourceRegistrationInfo]) { implicit request =>
for {
dataStore <- dataStoreDAO.findOneByName(request.body.dataStoreName) ?~> Messages(
"datastore.notFound",
request.body.dataStoreName) ~> NOT_FOUND
user = request.identity
isTeamManagerOrAdmin <- userService.isTeamManagerOrAdminOfOrg(user, user._organization)
_ <- Fox.fromBool(isTeamManagerOrAdmin || user.isDatasetManager) ~> FORBIDDEN
_ <- Fox.fromBool(request.body.dataSource.dataLayers.nonEmpty) ?~> "dataset.explore.zeroLayers"
_ <- datasetService.validatePaths(request.body.dataSource.allExplicitPaths, dataStore) ?~> "dataSource.add.pathsNotAllowed"
dataset <- datasetService.createVirtualDataset(
name,
dataStore,
request.body.dataSource,
request.body.folderId,
user
)
} yield Ok(Json.obj("newDatasetId" -> dataset._id))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Missed enforcement: honor dataStore.onlyAllowedOrganization for virtual adds

Add the same restriction applied in upload flows so users cannot use a datastore restricted to a different organization to create virtual datasets.

Also, map path validation failures to a client error.

   def addVirtualDataset(name: String): Action[DataSourceRegistrationInfo] =
     sil.SecuredAction.async(validateJson[DataSourceRegistrationInfo]) { implicit request =>
       for {
         dataStore <- dataStoreDAO.findOneByName(request.body.dataStoreName) ?~> Messages(
           "datastore.notFound",
           request.body.dataStoreName) ~> NOT_FOUND
         user = request.identity
         isTeamManagerOrAdmin <- userService.isTeamManagerOrAdminOfOrg(user, user._organization)
         _ <- Fox.fromBool(isTeamManagerOrAdmin || user.isDatasetManager) ~> FORBIDDEN
         _ <- Fox.fromBool(request.body.dataSource.dataLayers.nonEmpty) ?~> "dataset.explore.zeroLayers"
-        _ <- datasetService.validatePaths(request.body.dataSource.allExplicitPaths, dataStore) ?~> "dataSource.add.pathsNotAllowed"
+        _ <- Fox.fromBool(dataStore.onlyAllowedOrganization.forall(_ == user._organization)) ?~> "dataset.upload.Datastore.restricted" ~> FORBIDDEN
+        _ <- datasetService.validatePaths(request.body.dataSource.allExplicitPaths, dataStore) ?~> "dataSource.add.pathsNotAllowed" ~> BAD_REQUEST
         dataset <- datasetService.createVirtualDataset(
           name,
           dataStore,
           request.body.dataSource,
           request.body.folderId,
           user
         )
       } yield Ok(Json.obj("newDatasetId" -> dataset._id))
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def addVirtualDataset(name: String): Action[DataSourceRegistrationInfo] =
sil.SecuredAction.async(validateJson[DataSourceRegistrationInfo]) { implicit request =>
for {
dataStore <- dataStoreDAO.findOneByName(request.body.dataStoreName) ?~> Messages(
"datastore.notFound",
request.body.dataStoreName) ~> NOT_FOUND
user = request.identity
isTeamManagerOrAdmin <- userService.isTeamManagerOrAdminOfOrg(user, user._organization)
_ <- Fox.fromBool(isTeamManagerOrAdmin || user.isDatasetManager) ~> FORBIDDEN
_ <- Fox.fromBool(request.body.dataSource.dataLayers.nonEmpty) ?~> "dataset.explore.zeroLayers"
_ <- datasetService.validatePaths(request.body.dataSource.allExplicitPaths, dataStore) ?~> "dataSource.add.pathsNotAllowed"
dataset <- datasetService.createVirtualDataset(
name,
dataStore,
request.body.dataSource,
request.body.folderId,
user
)
} yield Ok(Json.obj("newDatasetId" -> dataset._id))
}
def addVirtualDataset(name: String): Action[DataSourceRegistrationInfo] =
sil.SecuredAction.async(validateJson[DataSourceRegistrationInfo]) { implicit request =>
for {
dataStore <- dataStoreDAO.findOneByName(request.body.dataStoreName) ?~> Messages(
"datastore.notFound",
request.body.dataStoreName) ~> NOT_FOUND
user = request.identity
isTeamManagerOrAdmin <- userService.isTeamManagerOrAdminOfOrg(user, user._organization)
_ <- Fox.fromBool(isTeamManagerOrAdmin || user.isDatasetManager) ~> FORBIDDEN
_ <- Fox.fromBool(request.body.dataSource.dataLayers.nonEmpty) ?~> "dataset.explore.zeroLayers"
_ <- Fox.fromBool(dataStore.onlyAllowedOrganization.forall(_ == user._organization)) ?~> "dataset.upload.Datastore.restricted" ~> FORBIDDEN
_ <- datasetService.validatePaths(request.body.dataSource.allExplicitPaths, dataStore) ?~> "dataSource.add.pathsNotAllowed" ~> BAD_REQUEST
dataset <- datasetService.createVirtualDataset(
name,
dataStore,
request.body.dataSource,
request.body.folderId,
user
)
} yield Ok(Json.obj("newDatasetId" -> dataset._id))
}
🤖 Prompt for AI Agents
In app/controllers/DatasetController.scala around lines 162–181, enforce
dataStore.onlyAllowedOrganization for virtual dataset creation and map path
validation failures to a client error: after loading dataStore and resolving
user/team manager check, add a guard that if
dataStore.onlyAllowedOrganization.isDefined and != user._organization then fail
with FORBIDDEN; change the datasetService.validatePaths line to map failures to
a BAD_REQUEST (use the same pattern as other checks: ?~>
"dataSource.add.pathsNotAllowed" ~> BAD_REQUEST) so path validation returns a
4xx instead of an internal error.

@fm3 fm3 merged commit 87e4cb4 into master Aug 18, 2025
5 checks passed
@fm3 fm3 deleted the clean-up-ds-wk-communication-virtual-datasets branch August 18, 2025 09:14
This was referenced Sep 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove datastore from communication about datasets where not neccessary
2 participants