-
Notifications
You must be signed in to change notification settings - Fork 29
Remove datastore from communication about virtual datasets #8848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughAdds WK-side endpoints and payloads to create virtual datasets and request on-disk deletion by dataset ID; removes datastore-facing registration and virtual-deletion endpoints; moves frontend datastore-host/token plumbing into WK API calls; adds path validation RPC and datastore local-path parsing change; updates service/client signatures and wiring. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~35 minutes Assessment against linked issues
Assessment against linked issues: Out-of-scope changes
Possibly related PRs
Suggested reviewers
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🔭 Outside diff range comments (2)
app/controllers/WKRemoteDataStoreController.scala (2)
241-244
: Bug: async side-effect is dropped; deletion not awaitedThe Fox effect from deleteDataset(...).flatMap(...) is assigned to a val and not sequenced. The DB delete and storage refresh may never run.
Apply:
- _ = datasetDAO - .deleteDataset(dataset._id, onlyMarkAsDeleted = annotationCount > 0) - .flatMap(_ => usedStorageService.refreshStorageReportForDataset(dataset)) + _ <- datasetDAO + .deleteDataset(dataset._id, onlyMarkAsDeleted = annotationCount > 0) + .flatMap(_ => usedStorageService.refreshStorageReportForDataset(dataset))
293-320
: Remove deadregisterDataSource
endpoint from WKRemoteDataStoreControllerThis action is no longer exposed in any
conf/*.routes
file and isn’t referenced elsewhere in the codebase. It, along with its dependency onDataSourceRegistrationInfo
, can be safely removed.Please:
- Delete the entire
registerDataSource
method (lines 293–320) in
app/controllers/WKRemoteDataStoreController.scala
.- Remove the import or usage of
DataSourceRegistrationInfo
in that file.- Verify that
DataSourceRegistrationInfo
remains inDatasetController.scala
for the still‐usedaddVirtualDataset
action.
🧹 Nitpick comments (4)
app/models/dataset/WKRemoteDataStoreClient.scala (1)
108-114
: New RPC deleteOnDisk looks correctPath, token and method align with datastore’s DataSourceController.deleteOnDisk.
Consider aligning parameter types with updateDatasetInDSCache (ObjectId vs String) for consistency.
app/models/dataset/DatasetService.scala (1)
671-679
: DB-only deletion path: add a TODO for cloud storage cleanupComment notes freeing S3 space “once implemented”. Consider tracking with a TODO and linking an issue to ensure follow-up.
app/controllers/DatasetController.scala (1)
74-78
: DTO placement and typing look fine; minor reuse considerationDataSourceRegistrationInfo is clear and the JSON format is correct. Consider moving this DTO to a shared dto package if it will be reused by other controllers to avoid controller bloat.
frontend/javascripts/admin/rest_api.ts (1)
1265-1283
: Optional: add small telemetry context for failure analyticsIf you want better diagnostics, pass a requestProperties object to sendFailedRequestAnalyticsEvent in your saga/caller when these calls fail, including datasetName/datasetId and dataStoreName.
Also applies to: 1348-1352
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
app/controllers/DatasetController.scala
(4 hunks)app/controllers/WKRemoteDataStoreController.scala
(2 hunks)app/models/dataset/DatasetService.scala
(4 hunks)app/models/dataset/WKRemoteDataStoreClient.scala
(1 hunks)conf/webknossos.latest.routes
(2 hunks)frontend/javascripts/admin/dataset/dataset_add_remote_view.tsx
(1 hunks)frontend/javascripts/admin/rest_api.ts
(2 hunks)frontend/javascripts/dashboard/advanced_dataset/dataset_action_view.tsx
(1 hunks)frontend/javascripts/dashboard/dataset/dataset_settings_delete_tab.tsx
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala
(0 hunks)webknossos-datastore/conf/datastore.latest.routes
(0 hunks)
💤 Files with no reviewable changes (2)
- webknossos-datastore/conf/datastore.latest.routes
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala
🧰 Additional context used
🧬 Code Graph Analysis (8)
frontend/javascripts/dashboard/advanced_dataset/dataset_action_view.tsx (1)
frontend/javascripts/admin/rest_api.ts (1)
deleteDatasetOnDisk
(1348-1352)
app/models/dataset/WKRemoteDataStoreClient.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)
deleteOnDisk
(430-452)webknossos-datastore/app/com/scalableminds/webknossos/datastore/rpc/RPCRequest.scala (1)
addQueryString
(28-31)app/controllers/UserTokenController.scala (1)
RpcTokenHolder
(30-38)
frontend/javascripts/dashboard/dataset/dataset_settings_delete_tab.tsx (1)
frontend/javascripts/admin/rest_api.ts (1)
deleteDatasetOnDisk
(1348-1352)
app/models/dataset/DatasetService.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (3)
DataSourceId
(10-12)DataSourceId
(14-31)allExplicitPaths
(68-68)app/models/dataset/Dataset.scala (1)
deleteDataset
(698-726)
app/controllers/DatasetController.scala (8)
util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)
validateJson
(204-209)app/models/dataset/DataStore.scala (1)
findOneByName
(113-120)app/controllers/UserController.scala (1)
user
(48-56)util/src/main/scala/com/scalableminds/util/tools/Fox.scala (4)
Fox
(30-223)Fox
(225-289)fromBool
(32-36)assertTrue
(187-193)app/models/dataset/DatasetService.scala (3)
createVirtualDataset
(104-127)isEditableBy
(584-600)deleteVirtualOrDiskDataset
(661-682)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)
deleteOnDisk
(33-80)app/models/dataset/Dataset.scala (1)
findOne
(207-214)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (1)
toString
(11-11)
app/controllers/WKRemoteDataStoreController.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (4)
DataSourcePathInfo
(41-41)DataSourcePathInfo
(43-45)DataStoreStatus
(31-31)DataStoreStatus
(32-34)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AccessTokenService.scala (1)
webknossos
(73-75)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (2)
GenericDataSource
(38-69)GenericDataSource
(71-76)util/src/main/scala/com/scalableminds/util/tools/Fox.scala (5)
Fox
(30-223)Fox
(225-289)successful
(53-56)s
(229-233)s
(233-243)
frontend/javascripts/admin/dataset/dataset_add_remote_view.tsx (1)
frontend/javascripts/admin/rest_api.ts (1)
storeRemoteDataset
(1265-1283)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: frontend-tests
- GitHub Check: backend-tests
- GitHub Check: build-smoketest-push
🔇 Additional comments (8)
app/controllers/WKRemoteDataStoreController.scala (1)
229-231
: Doc comment addition: good clarificationClearer contract that this endpoint is called post on-disk deletion by the datastore.
frontend/javascripts/dashboard/advanced_dataset/dataset_action_view.tsx (1)
166-167
: Frontend call updated to new APIdeleteDatasetOnDisk now correctly takes only the datasetId. UI and cache invalidation flow remain coherent.
frontend/javascripts/dashboard/dataset/dataset_settings_delete_tab.tsx (1)
33-35
: Frontend call updated to new APISwitched to deleteDatasetOnDisk(dataset.id). Matches updated REST helper.
frontend/javascripts/admin/dataset/dataset_add_remote_view.tsx (1)
149-155
: Virtual dataset creation path aligned with backend
- Using datastore name instead of URL.
- Passing parsed dataSource JSON object.
- Payload matches rest_api.ts StoreRemoteDatasetArgs.
Looks good.
conf/webknossos.latest.routes (1)
82-83
: Authorization checks validated
- app/controllers/DatasetController.scala#520–528:
deleteOnDisk
enforces
request.identity._organization == dataset._organization && request.identity.isAdmin
- app/controllers/DatasetController.scala#169–173:
addVirtualDataset
allows only
team managers, admins, or dataset managers viauserService.isTeamManagerOrAdminOfOrg
anduser.isDatasetManager
No changes needed.
app/models/dataset/DatasetService.scala (1)
119-121
: Correctly overriding DataSourceId for virtual datasetsSetting id = DataSourceId(datasetName, organizationId) ensures directoryName consistency for new virtual datasets.
frontend/javascripts/admin/rest_api.ts (2)
1348-1352
: LGTM: route switch to WK for deletionDeletion now targets WK directly, as intended by the PR objective. No token/host required. Looks good.
1259-1283
: AllstoreRemoteDataset
anddeleteDatasetOnDisk
call sites compile with the tightened typesCall site summary:
• frontend/javascripts/admin/dataset/dataset_add_remote_view.tsx (line 149)
storeRemoteDataset(datastoreToUse.name, datasetName, dataSourceJson, targetFolderId)• frontend/javascripts/dashboard/dataset/dataset_settings_delete_tab.tsx (line 34)
deleteDatasetOnDisk(dataset.id)• frontend/javascripts/dashboard/advanced_dataset/dataset_action_view.tsx (line 166)
deleteDatasetOnDisk(dataset.id)No further changes required.
...-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
Outdated
Show resolved
Hide resolved
If a dataset is virtual, we must not expect any data on the datastore in that datasetDirectoryName directory. If local layers are used, they will link to other local layers (so we do not need to delete anything), or some other data, where we are not responsible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)
445-449
: Else-branch now notifies WK when dataset is missing on disk — consider adding error mapping and cache invalidationGood fix: calling
dsRemoteWebknossosClient.deleteDataSource(dataSourceId)
here addresses the stale-DB risk when files were removed manually. Two small improvements:
- Map failures to a clear message, like elsewhere:
?~> "dataset.delete.webknossos.failed"
.- Invalidate the DS-side
datasetCache
to avoid stale reads immediately after deletion.Suggested diff within this hunk:
- _ <- dsRemoteWebknossosClient.deleteDataSource(dataSourceId) - _ = logger.warn(s"Tried to delete dataset ${dataSource.id} that is not on disk.") + _ <- dsRemoteWebknossosClient.deleteDataSource(dataSourceId) ?~> "dataset.delete.webknossos.failed" + _ = datasetCache.invalidateCache(datasetId) + _ = logger.warn(s"Tried to delete dataset ${dataSource.id} that is not on disk.")Optional: Mirror the same cache invalidation in the on-disk branch after the WK callback.
To ensure there are no lingering uses of the removed virtual-deletion path, run:
#!/bin/bash # Look for old calls that might have used deleteVirtualDataset or similar rg -n --pcre2 'deleteVirtualDataset|registerDataSource\(|registerDatasource\('frontend/javascripts/admin/rest_api.ts (1)
1259-1263
: Typed payload and WK endpoint transition for virtual datasets — looks good
- Using
APIDataSource
inStoreRemoteDatasetArgs
and in the function signature removes the looseany
and aligns with server expectations.- Posting to
/api/datasets/addVirtualDataset/${datasetName}
with{ dataStoreName, dataSource, folderId? }
matches the new API.Minor: Optional to set
dataStoreName
directly asdataStoreName
without the redundant key-value re-labeling.- dataStoreName: dataStoreName, + dataStoreName,Otherwise LGTM.
Also applies to: 1266-1269, 1271-1277, 1279-1282
🧹 Nitpick comments (3)
app/models/dataset/DatasetService.scala (2)
661-674
: Unified deletion entry point looks good; consider follow-ups for UXThe method cleanly separates virtual vs non-virtual:
- Virtual: mark as deleted in DB.
- Non-virtual: delegate on-disk deletion via the datastore client.
Two optional improvements for UX consistency:
- After successful virtual deletion, consider clearing related thumbnails/caches (e.g., via the controller using
thumbnailCachingService.removeFromCache(dataset._id)
), to avoid stale UI.- Add logging around the branch taken to aid ops visibility.
No blocker.
46-46
: Remove unused import
java.net.URI
isn’t used in this file anymore. Recommend removing to avoid unused-import warnings.-import java.net.URI
app/controllers/DatasetController.scala (1)
521-530
: Deletion endpoint enforces org-admin and editability — consider clearing thumbnail cache post-deletionChecks are solid:
- Global feature flag gate.
- Editability via
isEditableBy
.- Org-admin via
isAdminOf(dataset._organization)
.Optional: after successful deletion, clear thumbnails to avoid stale UI:
- } yield Ok + } yield OkAnd in the block before yielding Ok:
_ <- Fox.runIf(dataset.isVirtual)(thumbnailCachingService.removeFromCache(dataset._id))Or always clear, if cache presence isn’t conditional.
No blocker.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
app/controllers/DatasetController.scala
(4 hunks)app/models/dataset/DatasetService.scala
(4 hunks)frontend/javascripts/admin/rest_api.ts
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala
(0 hunks)
💤 Files with no reviewable changes (1)
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-12T13:07:29.637Z
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.
Applied to files:
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
🧬 Code Graph Analysis (4)
app/models/dataset/DatasetService.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (4)
AbstractDataLayer
(523-537)AbstractDataLayer
(539-559)from
(541-558)from
(581-600)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (2)
DataSourceId
(10-12)DataSourceId
(14-31)app/models/dataset/WKRemoteDataStoreClient.scala (1)
deleteOnDisk
(108-115)
frontend/javascripts/admin/rest_api.ts (1)
frontend/javascripts/types/api_types.ts (1)
APIDataSource
(151-151)
app/controllers/DatasetController.scala (8)
util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)
validateJson
(204-209)app/models/dataset/DataStore.scala (1)
findOneByName
(113-120)app/models/user/UserService.scala (1)
isTeamManagerOrAdminOfOrg
(321-326)util/src/main/scala/com/scalableminds/util/tools/Fox.scala (4)
Fox
(30-223)Fox
(225-289)fromBool
(32-36)assertTrue
(187-193)app/models/dataset/DatasetService.scala (2)
createVirtualDataset
(104-127)deleteVirtualOrDiskDataset
(661-675)app/models/user/User.scala (3)
findOne
(152-159)isAdminOf
(55-58)isAdminOf
(58-61)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (1)
toString
(11-11)util/src/main/scala/com/scalableminds/util/time/Instant.scala (1)
toString
(15-15)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AccessTokenService.scala (2)
webknossos
(73-75)deleteDataSource
(34-36)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
deleteDataSource
(147-152)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: backend-tests
- GitHub Check: build-smoketest-push
- GitHub Check: frontend-tests
🔇 Additional comments (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)
22-22
: Import cleanup is fineDropping DataSourceId from the import list is safe here; the file only uses the value via
dataSource.id
. No issues.app/models/dataset/DatasetService.scala (2)
119-121
: Ensure DataSourceId is set for virtual datasetsSetting
id = DataSourceId(datasetName, organizationId)
while convertingdataLayers
toAbstractDataLayer
is correct and aligns the virtual dataset’s data source identity with the directory name in DB.
159-159
: Explicit organization-not-found mappingAdding
?~> "organization.notFound"
improves error semantics for dataset creation. Good.frontend/javascripts/admin/rest_api.ts (1)
1348-1352
: deleteDatasetOnDisk callers updated — no stale multi-arg usages remainAll call sites have been verified with ripgrep and no instances of
deleteDatasetOnDisk
with multiple arguments orstoreRemoteDataset
with the old four-arg signature were found. ✅app/controllers/DatasetController.scala (2)
74-79
: Payload type for virtual dataset creationDefining
DataSourceRegistrationInfo
with a JSON formatter keeps the API surface explicit and stable. Good choice to embeddataStoreName
alongsidedataSource
andfolderId
.
162-181
: Strong authorization and validation before creating virtual datasets
- Org-scoped permission check via
isTeamManagerOrAdminOfOrg(user, user._organization)
ORuser.isDatasetManager
is appropriate.- The non-empty
dataLayers
guard mirrors the explore flow. Good.No blocker. Consider trimming
name
server-side if not already validated in service.
Looking good, thanks! I found one problem, though. The restrictions introduced in #8741 for the add route seem to be missing now. I fear they can’t be easily implemented on the wk side, since localDirectoryWhitelist is not available there. I think we need a way to access it. Otherwise, users can add datasets with arbitrary paths. What do you think? Or maybe we should leave add on the datastore side for the moment so the check can happen there before it is forwarded to the wk side. |
@fm3 I added a validation for paths as a new route to the datastore. What do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
app/controllers/DatasetController.scala (2)
162-182
: LGTM! Comprehensive virtual dataset creation flowThe implementation includes all necessary checks:
- Authorization verification
- Non-empty data layers validation
- Path validation through the datastore
- Virtual dataset creation with proper error handling
522-531
: LGTM! Proper permission checks for dataset deletionThe implementation correctly:
- Checks if deletion is enabled via feature flag
- Verifies edit permissions
- Requires organization admin status
- Delegates to the appropriate deletion method
🧹 Nitpick comments (1)
app/models/dataset/DatasetService.scala (1)
661-670
: Consider more specific error messages for path validation failuresThe current implementation provides a generic error message for all path validation failures. Consider providing more specific messages to help users understand why a path failed validation.
def validatePaths(paths: Seq[String], dataStore: DataStore): Fox[Unit] = for { _ <- Fox.successful(()) client = new WKRemoteDataStoreClient(dataStore, rpc) pathValidationResults <- client.validatePaths(paths) _ <- Fox.serialCombined(pathValidationResults)({ case PathValidationResult(_, true) => Fox.successful(()) - case PathValidationResult(path, false) => Fox.failure(s"Path validation failed for path: $path") + case PathValidationResult(path, false) => + // Provide more context about why validation failed + if (path.startsWith("http://") || path.startsWith("https://")) { + Fox.failure(s"Remote path requires credentials: $path") + } else if (path.contains("..")) { + Fox.failure(s"Path contains directory traversal: $path") + } else { + Fox.failure(s"Path not in whitelist or dataset directory: $path") + } }) } yield ()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
app/controllers/DatasetController.scala
(4 hunks)app/models/dataset/DatasetService.scala
(4 hunks)app/models/dataset/WKRemoteDataStoreClient.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
(6 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RemoteSourceDescriptorService.scala
(1 hunks)webknossos-datastore/conf/datastore.latest.routes
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- webknossos-datastore/conf/datastore.latest.routes
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-12T13:07:29.637Z
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.
Applied to files:
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
🧬 Code Graph Analysis (3)
app/models/dataset/DatasetService.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (4)
PathValidationResult
(47-50)PathValidationResult
(52-54)validatePaths
(686-699)deleteOnDisk
(440-464)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (4)
AbstractDataLayer
(523-537)AbstractDataLayer
(539-559)from
(541-558)from
(581-600)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (2)
DataSourceId
(10-12)DataSourceId
(14-31)
app/controllers/DatasetController.scala (6)
app/models/dataset/DataStore.scala (1)
findOneByName
(113-120)app/models/user/UserService.scala (1)
isTeamManagerOrAdminOfOrg
(321-326)app/models/dataset/WKRemoteDataStoreClient.scala (2)
validatePaths
(102-107)deleteOnDisk
(114-121)app/models/dataset/DatasetService.scala (3)
validatePaths
(661-672)createVirtualDataset
(104-127)deleteVirtualOrDiskDataset
(672-686)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (2)
allExplicitPaths
(68-68)toString
(11-11)app/models/user/User.scala (3)
findOne
(152-159)isAdminOf
(55-58)isAdminOf
(58-61)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (2)
DataVaultService
(21-30)DataVaultService
(32-69)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RemoteSourceDescriptorService.scala (2)
RemoteSourceDescriptorService
(20-177)pathIsAllowedToAddDirectly
(151-157)util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)
validateJson
(204-209)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: backend-tests
- GitHub Check: build-smoketest-push
- GitHub Check: frontend-tests
🔇 Additional comments (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (2)
456-459
: LGTM! Good handling of the edge caseThe updated logic correctly handles the case where a dataset is not found on disk by still notifying webknossos to mark it as deleted in the database, maintaining consistency.
686-697
: Whitelist enforcement verified for local paths
ThevalidatePaths
endpoint correctly delegates toremoteSourceDescriptorService.pathIsAllowedToAddDirectly
, which in turn checks againstDataStoreConfig.Datastore.localDirectoryWhitelist
. No further changes are necessary.•
RemoteSourceDescriptorService.pathIsAllowedToAddDirectly
usesdataStoreConfig.Datastore.localDirectoryWhitelist
to allow or reject absolute paths
•ExploreRemoteLayerService
applies the same whitelist check for incoming URIs
•DataStoreConfig
loadslocalDirectoryWhitelist
fromdatastore.localDirectoryWhitelist
in the configurationapp/models/dataset/WKRemoteDataStoreClient.scala (2)
102-105
: LGTM! Clean implementation of path validation endpointThe implementation correctly uses RPC with the webknossos token for authentication.
114-119
: LGTM! Well-structured deleteOnDisk implementationThe implementation follows the established pattern and properly handles authentication.
app/models/dataset/DatasetService.scala (1)
672-684
: LGTM! Clean separation of virtual vs disk datasetsThe implementation correctly handles both virtual and non-virtual datasets, with appropriate cleanup for each type.
app/controllers/DatasetController.scala (1)
74-78
: LGTM! Well-structured data modelThe
DataSourceRegistrationInfo
case class and its JSON formatter are properly defined.
private def pathIsDataSourceLocal(pathLiteral: String): Boolean = | ||
pathIsLocal(pathLiteral) && { | ||
val path = Path.of(pathLiteral) | ||
val path = Path.of(new URI(pathLiteral).getPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change from Path.of(pathLiteral)
to Path.of(new URI(pathLiteral).getPath)
introduces two concerns:
- URISyntaxException risk: If
pathLiteral
is not a valid URI (e.g., contains spaces or special characters), this will throw an exception - Behavior change: URI paths starting with "/" become absolute, causing the validation logic to fail even for valid local paths
This affects the critical pathIsAllowedToAddDirectly
method used for path validation.
Apply this fix to handle URI parsing safely and preserve the original semantics:
private def pathIsDataSourceLocal(pathLiteral: String): Boolean =
pathIsLocal(pathLiteral) && {
- val path = Path.of(new URI(pathLiteral).getPath)
+ val path = try {
+ val uri = new URI(pathLiteral)
+ if (uri.getScheme == null || uri.getScheme == DataVaultService.schemeFile) {
+ // For local paths, use the literal directly to preserve relative paths
+ Path.of(pathLiteral)
+ } else {
+ Path.of(uri.getPath)
+ }
+ } catch {
+ case _: Exception => Path.of(pathLiteral) // Fallback to literal for non-URI paths
+ }
val workingDir = Path.of(".").toAbsolutePath.normalize
val inWorkingDir = workingDir.resolve(path).toAbsolutePath.normalize
!path.isAbsolute && inWorkingDir.startsWith(workingDir)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
val path = Path.of(new URI(pathLiteral).getPath) | |
private def pathIsDataSourceLocal(pathLiteral: String): Boolean = | |
pathIsLocal(pathLiteral) && { | |
val path = try { | |
val uri = new URI(pathLiteral) | |
if (uri.getScheme == null || uri.getScheme == DataVaultService.schemeFile) { | |
// For local paths, use the literal directly to preserve relative paths | |
Path.of(pathLiteral) | |
} else { | |
Path.of(uri.getPath) | |
} | |
} catch { | |
case _: Exception => | |
// Fallback to literal for non-URI or malformed-URI inputs | |
Path.of(pathLiteral) | |
} | |
val workingDir = Path.of(".").toAbsolutePath.normalize | |
val inWorkingDir = workingDir.resolve(path).toAbsolutePath.normalize | |
!path.isAbsolute && inWorkingDir.startsWith(workingDir) | |
} |
🤖 Prompt for AI Agents
In
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RemoteSourceDescriptorService.scala
around line 164, the change to Path.of(new URI(pathLiteral).getPath) can throw
URISyntaxException and can change semantics by turning inputs into absolute
paths; instead, attempt to parse pathLiteral as a URI inside a try/catch, and
only use URI.getPath when the parsed URI has a non-null scheme (or is clearly a
file:// style URI); if parsing fails or the URI has no scheme, fall back to
Path.of(pathLiteral) so existing local-path validation behavior is preserved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For pathLiteral: "file://home/felix/scm/webknossos/binaryData/sample_organization/l4_sample/color", with the new code:
path = /felix/scm/webknossos/binaryData/sample_organization/l4_sample/color
workingDir = /home/felix/scm/webknossos
inWorkingDir = /felix/scm/webknossos/binaryData/sample_organization/l4_sample/color
inWorkingDir.startsWith(workingDir) = false
path.isAbsolute = true
--> Path gets rejected correctly as an absolute path
Old code:
path = file:/home/felix/scm/webknossos/binaryData/sample_organization/l4_sample/color
workingDir = /home/felix/scm/webknossos
inWorkingDir = /home/felix/scm/webknossos/file:/home/felix/scm/webknossos/binaryData/sample_organization/l4_sample/color
inWorkingDir.startsWith(workingDir) = false
path.isAbsolute = false
--> Path gets wrongly accepted even though it should be absoute path, because the file:// scheme is not handled.
In workingDir does not seem to work correctly in both cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@frcroth Thank you for the detailed analysis! Your example actually demonstrates that the new code is working correctly and fixes a security issue:
The change is actually a bug fix:
- Old code bug:
file://
URIs were incorrectly treated as relative paths becausePath.of("file://...")
doesn't parse URI schemes properly - New code fix: Correctly extracts the path component (
/felix/scm/...
) and properly identifies it as absolute, leading to correct rejection
Your example shows the new code is behaving as intended - absolute paths should be rejected for security reasons.
However, there's still the potential exception risk I mentioned. Here's a safer approach that preserves the fix while handling edge cases:
private def pathIsDataSourceLocal(pathLiteral: String): Boolean =
pathIsLocal(pathLiteral) && {
val path = try {
val uri = new URI(pathLiteral)
if (uri.getScheme != null) {
// For URIs with schemes (like file://), extract the path component
Path.of(uri.getPath)
} else {
// For plain paths without schemes, use as-is
Path.of(pathLiteral)
}
} catch {
case _: URISyntaxException => Path.of(pathLiteral) // Fallback for malformed URIs
}
val workingDir = Path.of(".").toAbsolutePath.normalize
val inWorkingDir = workingDir.resolve(path).toAbsolutePath.normalize
!path.isAbsolute && inWorkingDir.startsWith(workingDir)
}
Regarding the inWorkingDir
logic issue you mentioned - that might be a separate concern worth investigating if the working directory resolution isn't behaving as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For relative paths (e.g. pathLiteral "./color"), new URI("./color").getPath is ./color, so we do not change the behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@frcroth You're absolutely correct! Thank you for testing that specific case.
For relative paths like "./color"
:
new URI("./color").getPath
returns"./color"
Path.of("./color").isAbsolute
returnsfalse
- The validation logic remains unchanged for relative paths
This confirms that your change is a precise fix that:
- ✅ Preserves existing behavior for relative paths
- ✅ Correctly handles URI schemes (like
file://
) by extracting the path component - ✅ Fixes the security issue where
file://
URIs were incorrectly treated as relative
The change is working exactly as intended - no additional complexity needed. My earlier suggestion was overly cautious given that relative paths work correctly with the new implementation.
🐰
∩___∩
( ◕ ◕ )
\ ___ /
| |
_____/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, the weird handling of inWorkingDir was caused by using two instead of three slashes (file:///), which made the path weird. With the actually correct URI, the inWorkingDir check works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@frcroth That makes perfect sense! You're absolutely right - using file://
(two slashes) instead of the proper file:///
(three slashes) would definitely cause weird URI parsing behavior. The malformed URI format would lead to unexpected path extraction, which explains the strange inWorkingDir
resolution you observed.
With properly formatted file URIs, your code change works exactly as intended:
- ✅ Correctly handles relative paths like
"./color"
- ✅ Properly extracts path components from valid URIs like
file:///home/...
- ✅ Fixes the security issue where malformed
file://
URIs were incorrectly treated as relative paths
The implementation is solid! 🎯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
app/controllers/DatasetController.scala (3)
74-78
: Use ObjectId for folderId in DTO for type-safety and consistencyEverywhere else in this controller, folderId is modeled as ObjectId. Keeping the same type here avoids accidental invalid IDs slipping through and keeps JSON formats consistent (there’s already an implicit Format[ObjectId] in scope in this codebase). If the frontend sends a string, Play JSON can still parse it into ObjectId via the implicit.
Apply this diff:
-case class DataSourceRegistrationInfo(dataSource: DataSource, folderId: Option[String], dataStoreName: String) +case class DataSourceRegistrationInfo(dataSource: DataSource, folderId: Option[ObjectId], dataStoreName: String)Follow-ups:
- Ensure datasetService.createVirtualDataset expects Option[ObjectId] (or convert at the boundary).
- Adjust the frontend to continue sending the folderId as a string; Play will parse it into ObjectId.
162-182
: Solid: permissions + layer guard + path validation. Two small improvements suggestedThe flow looks correct: permission gate, non-empty layers check, datastore path validation, then create. Two minor suggestions:
- Trim dataStoreName before lookup to avoid subtle mismatches.
- Preserve detailed path validation errors (or map to a 403) instead of overriding with a generic message, so the UI can surface which path failed. If you prefer i18n-only messages, at least attach a 403 to reflect an authorization-like failure.
Apply this diff to trim the dataStoreName:
- dataStore <- dataStoreDAO.findOneByName(request.body.dataStoreName) ?~> Messages( + dataStore <- dataStoreDAO.findOneByName(request.body.dataStoreName.trim) ?~> Messages( "datastore.notFound", request.body.dataStoreName) ~> NOT_FOUNDOption A (preserve detailed error): remove the custom message mapping so the specific failure from validatePaths bubbles up.
- _ <- datasetService.validatePaths(request.body.dataSource.allExplicitPaths, dataStore) ?~> "dataSource.add.pathsNotAllowed" + _ <- datasetService.validatePaths(request.body.dataSource.allExplicitPaths, dataStore)Option B (keep i18n but map to 403):
- _ <- datasetService.validatePaths(request.body.dataSource.allExplicitPaths, dataStore) ?~> "dataSource.add.pathsNotAllowed" + _ <- datasetService.validatePaths(request.body.dataSource.allExplicitPaths, dataStore) ?~> "dataSource.add.pathsNotAllowed" ~> FORBIDDEN
522-531
: Admin check fix looks good; map “delete disabled” to 403Great job switching to isAdminOf(dataset._organization) and keeping editability enforcement. One small improvement: when deletion is globally disabled, return FORBIDDEN to reflect a permission-like failure.
Apply this diff:
- _ <- Fox.fromBool(conf.Features.allowDeleteDatasets) ?~> "dataset.delete.disabled" + _ <- Fox.fromBool(conf.Features.allowDeleteDatasets) ?~> "dataset.delete.disabled" ~> FORBIDDEN
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
app/controllers/DatasetController.scala
(4 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
app/controllers/DatasetController.scala (7)
util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)
validateJson
(204-209)util/src/main/scala/com/scalableminds/util/tools/Fox.scala (3)
Fox
(30-223)Fox
(225-289)fromBool
(32-36)app/models/dataset/DatasetService.scala (2)
validatePaths
(661-672)deleteVirtualOrDiskDataset
(672-686)app/models/dataset/WKRemoteDataStoreClient.scala (2)
validatePaths
(102-107)deleteOnDisk
(114-121)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (2)
allExplicitPaths
(68-68)toString
(11-11)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)
deleteOnDisk
(33-80)app/models/user/User.scala (2)
isAdminOf
(55-58)isAdminOf
(58-61)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: frontend-tests
- GitHub Check: build-smoketest-push
- GitHub Check: backend-tests
🔇 Additional comments (2)
app/controllers/DatasetController.scala (2)
10-10
: LGTM: Correct import of DataSource/ElementClassImporting DataSource and ElementClass here is appropriate and used below.
162-182
: Signatures verified—no mismatches detectedThe
createVirtualDataset
method takes anOption[String]
forfolderId
, which aligns withDataSourceRegistrationInfo.folderId: Option[String]
.
ThedeleteOnDisk
method onWKRemoteDataStoreClient
accepts anObjectId
and is invoked withdataset._id: ObjectId
.No changes are required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (2)
app/controllers/WKRemoteDataStoreController.scala (2)
236-244
: Blocking bug: delete not executed due to val assignment instead of monadic bindInside the for-comprehension, the delete+refresh chain is assigned (
_ = ...
) rather than bound (_ <- ...
). As written, the Fox action is constructed but never executed, so the dataset won’t be marked deleted nor will storage be refreshed.Fix by binding the effect:
- _ = datasetDAO - .deleteDataset(dataset._id, onlyMarkAsDeleted = annotationCount > 0) - .flatMap(_ => usedStorageService.refreshStorageReportForDataset(dataset)) + _ <- datasetDAO + .deleteDataset(dataset._id, onlyMarkAsDeleted = annotationCount > 0) + .flatMap(_ => usedStorageService.refreshStorageReportForDataset(dataset))
303-316
: Enforce organization restriction for organization-scoped datastoresFor consistency with reserveDatasetUpload and to avoid bypassing datastore organization scoping, add a check against dataStore.onlyAllowedOrganization before creating a virtual dataset via registerDataSource.
for { user <- bearerTokenService.userForToken(token) organization <- organizationDAO.findOne(organizationId)(GlobalAccessContext) ?~> Messages( "organization.notFound", organizationId) ~> NOT_FOUND _ <- Fox.fromBool(organization._id == user._organization) ?~> "notAllowed" ~> FORBIDDEN + _ <- Fox.fromBool(dataStore.onlyAllowedOrganization.forall(_ == organization._id)) ?~> "dataset.upload.Datastore.restricted" ~> FORBIDDEN dataset <- datasetService.createVirtualDataset( directoryName, dataStore, request.body.dataSource, request.body.folderId, user )
🧹 Nitpick comments (3)
app/models/dataset/DatasetService.scala (2)
660-670
: Nit: remove no-op Fox step and tighten validation foldingThe
_ <- Fox.successful(())
line is redundant. Also, you can simplify the serial check with a fold if you prefer. Minimal change below removes the no-op:def validatePaths(paths: Seq[String], dataStore: DataStore): Fox[Unit] = for { - _ <- Fox.successful(()) client = new WKRemoteDataStoreClient(dataStore, rpc) pathValidationResults <- client.validatePaths(paths) _ <- Fox.serialCombined(pathValidationResults)({ case PathValidationResult(_, true) => Fox.successful(()) case PathValidationResult(path, false) => Fox.failure(s"Path validation failed for path: $path") }) } yield ()
104-125
: Centralize datastore organization scoping (optional)Currently, the “organization-scoped datastore” check happens in controllers (or not at all, depending on the entry point). Consider guarding
createVirtualDataset
itself (e.g., with adataStore.onlyAllowedOrganization.forall(_ == user._organization)
check) to prevent accidental bypass from future call sites.app/controllers/DatasetController.scala (1)
521-526
: Optional: map feature flag failure to explicit statusFor consistency with other guarded routes, consider mapping
dataset.delete.disabled
to FORBIDDEN (or 404 if you want to hide capability).- _ <- Fox.fromBool(conf.Features.allowDeleteDatasets) ?~> "dataset.delete.disabled" + _ <- Fox.fromBool(conf.Features.allowDeleteDatasets) ?~> "dataset.delete.disabled" ~> FORBIDDEN
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (6)
app/controllers/DatasetController.scala
(4 hunks)app/controllers/WKRemoteDataStoreController.scala
(2 hunks)app/models/dataset/ComposeService.scala
(0 hunks)app/models/dataset/DatasetService.scala
(4 hunks)app/models/dataset/explore/WKExploreRemoteLayerService.scala
(0 hunks)conf/webknossos.latest.routes
(2 hunks)
💤 Files with no reviewable changes (2)
- app/models/dataset/ComposeService.scala
- app/models/dataset/explore/WKExploreRemoteLayerService.scala
🚧 Files skipped from review as they are similar to previous changes (1)
- conf/webknossos.latest.routes
🧰 Additional context used
🧬 Code Graph Analysis (3)
app/models/dataset/DatasetService.scala (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (4)
PathValidationResult
(47-50)PathValidationResult
(52-54)validatePaths
(686-699)deleteOnDisk
(440-464)app/models/dataset/Dataset.scala (2)
doesDatasetDirectoryExistInOrganization
(437-453)findOne
(207-214)util/src/main/scala/com/scalableminds/util/accesscontext/AccessContext.scala (1)
GlobalAccessContext
(25-29)app/models/organization/Organization.scala (1)
findOne
(93-102)app/models/dataset/WKRemoteDataStoreClient.scala (2)
validatePaths
(102-107)deleteOnDisk
(114-121)
app/controllers/DatasetController.scala (5)
util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)
validateJson
(204-209)util/src/main/scala/com/scalableminds/util/tools/Fox.scala (4)
Fox
(30-223)Fox
(225-298)fromBool
(32-36)assertTrue
(187-193)app/models/dataset/DatasetService.scala (2)
validatePaths
(660-671)deleteVirtualOrDiskDataset
(671-685)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (1)
allExplicitPaths
(68-68)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)
deleteOnDisk
(33-80)
app/controllers/WKRemoteDataStoreController.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (4)
DataSourcePathInfo
(41-41)DataSourcePathInfo
(43-45)DataStoreStatus
(31-31)DataStoreStatus
(32-34)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-smoketest-push
- GitHub Check: backend-tests
- GitHub Check: frontend-tests
🔇 Additional comments (5)
app/controllers/WKRemoteDataStoreController.scala (1)
229-233
: Endpoint semantics: OK; relies on DS callback for non-virtual deletionsThe new deleteDataset endpoint correctly models the "on-disk deletion done by DS, finalize in WK" flow and handles the annotation-count-based soft delete. Once the binding bug is fixed, the behavior aligns with the PR objectives.
app/models/dataset/DatasetService.scala (3)
104-125
: createVirtualDataset: signature change and usage look correctUsing user._organization instead of passing organizationId is sensible, and uniqueness checks + folder access enforcement are intact. Good use of AbstractDataLayer.from when constructing the abstract data source.
158-159
: Improved error message on organization lookupSwitching to
?~> "organization.notFound"
makes failure reasons clearer when creating datasets from reported data sources.
671-684
: Deletion flow split (virtual vs. on-disk): OKThe new deleteVirtualOrDiskDataset correctly defers on-disk deletion to DS and soft-deletes virtual datasets locally. The centralized implementation matches the PR’s architecture changes.
app/controllers/DatasetController.scala (1)
521-530
: Deletion permissions: looks correct nowThe org-scoped admin check via
isAdminOf(dataset._organization)
and the preceding editability assertion align with intended authorization semantics. The unified delete method in DatasetService keeps virtual vs. non-virtual logic in one place.
def addVirtualDataset(name: String): Action[DataSourceRegistrationInfo] = | ||
sil.SecuredAction.async(validateJson[DataSourceRegistrationInfo]) { implicit request => | ||
for { | ||
dataStore <- dataStoreDAO.findOneByName(request.body.dataStoreName) ?~> Messages( | ||
"datastore.notFound", | ||
request.body.dataStoreName) ~> NOT_FOUND | ||
user = request.identity | ||
isTeamManagerOrAdmin <- userService.isTeamManagerOrAdminOfOrg(user, user._organization) | ||
_ <- Fox.fromBool(isTeamManagerOrAdmin || user.isDatasetManager) ~> FORBIDDEN | ||
_ <- Fox.fromBool(request.body.dataSource.dataLayers.nonEmpty) ?~> "dataset.explore.zeroLayers" | ||
_ <- datasetService.validatePaths(request.body.dataSource.allExplicitPaths, dataStore) ?~> "dataSource.add.pathsNotAllowed" | ||
dataset <- datasetService.createVirtualDataset( | ||
name, | ||
dataStore, | ||
request.body.dataSource, | ||
request.body.folderId, | ||
user | ||
) | ||
} yield Ok(Json.obj("newDatasetId" -> dataset._id)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed enforcement: honor dataStore.onlyAllowedOrganization for virtual adds
Add the same restriction applied in upload flows so users cannot use a datastore restricted to a different organization to create virtual datasets.
Also, map path validation failures to a client error.
def addVirtualDataset(name: String): Action[DataSourceRegistrationInfo] =
sil.SecuredAction.async(validateJson[DataSourceRegistrationInfo]) { implicit request =>
for {
dataStore <- dataStoreDAO.findOneByName(request.body.dataStoreName) ?~> Messages(
"datastore.notFound",
request.body.dataStoreName) ~> NOT_FOUND
user = request.identity
isTeamManagerOrAdmin <- userService.isTeamManagerOrAdminOfOrg(user, user._organization)
_ <- Fox.fromBool(isTeamManagerOrAdmin || user.isDatasetManager) ~> FORBIDDEN
_ <- Fox.fromBool(request.body.dataSource.dataLayers.nonEmpty) ?~> "dataset.explore.zeroLayers"
- _ <- datasetService.validatePaths(request.body.dataSource.allExplicitPaths, dataStore) ?~> "dataSource.add.pathsNotAllowed"
+ _ <- Fox.fromBool(dataStore.onlyAllowedOrganization.forall(_ == user._organization)) ?~> "dataset.upload.Datastore.restricted" ~> FORBIDDEN
+ _ <- datasetService.validatePaths(request.body.dataSource.allExplicitPaths, dataStore) ?~> "dataSource.add.pathsNotAllowed" ~> BAD_REQUEST
dataset <- datasetService.createVirtualDataset(
name,
dataStore,
request.body.dataSource,
request.body.folderId,
user
)
} yield Ok(Json.obj("newDatasetId" -> dataset._id))
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def addVirtualDataset(name: String): Action[DataSourceRegistrationInfo] = | |
sil.SecuredAction.async(validateJson[DataSourceRegistrationInfo]) { implicit request => | |
for { | |
dataStore <- dataStoreDAO.findOneByName(request.body.dataStoreName) ?~> Messages( | |
"datastore.notFound", | |
request.body.dataStoreName) ~> NOT_FOUND | |
user = request.identity | |
isTeamManagerOrAdmin <- userService.isTeamManagerOrAdminOfOrg(user, user._organization) | |
_ <- Fox.fromBool(isTeamManagerOrAdmin || user.isDatasetManager) ~> FORBIDDEN | |
_ <- Fox.fromBool(request.body.dataSource.dataLayers.nonEmpty) ?~> "dataset.explore.zeroLayers" | |
_ <- datasetService.validatePaths(request.body.dataSource.allExplicitPaths, dataStore) ?~> "dataSource.add.pathsNotAllowed" | |
dataset <- datasetService.createVirtualDataset( | |
name, | |
dataStore, | |
request.body.dataSource, | |
request.body.folderId, | |
user | |
) | |
} yield Ok(Json.obj("newDatasetId" -> dataset._id)) | |
} | |
def addVirtualDataset(name: String): Action[DataSourceRegistrationInfo] = | |
sil.SecuredAction.async(validateJson[DataSourceRegistrationInfo]) { implicit request => | |
for { | |
dataStore <- dataStoreDAO.findOneByName(request.body.dataStoreName) ?~> Messages( | |
"datastore.notFound", | |
request.body.dataStoreName) ~> NOT_FOUND | |
user = request.identity | |
isTeamManagerOrAdmin <- userService.isTeamManagerOrAdminOfOrg(user, user._organization) | |
_ <- Fox.fromBool(isTeamManagerOrAdmin || user.isDatasetManager) ~> FORBIDDEN | |
_ <- Fox.fromBool(request.body.dataSource.dataLayers.nonEmpty) ?~> "dataset.explore.zeroLayers" | |
_ <- Fox.fromBool(dataStore.onlyAllowedOrganization.forall(_ == user._organization)) ?~> "dataset.upload.Datastore.restricted" ~> FORBIDDEN | |
_ <- datasetService.validatePaths(request.body.dataSource.allExplicitPaths, dataStore) ?~> "dataSource.add.pathsNotAllowed" ~> BAD_REQUEST | |
dataset <- datasetService.createVirtualDataset( | |
name, | |
dataStore, | |
request.body.dataSource, | |
request.body.folderId, | |
user | |
) | |
} yield Ok(Json.obj("newDatasetId" -> dataset._id)) | |
} |
🤖 Prompt for AI Agents
In app/controllers/DatasetController.scala around lines 162–181, enforce
dataStore.onlyAllowedOrganization for virtual dataset creation and map path
validation failures to a client error: after loading dataStore and resolving
user/team manager check, add a guard that if
dataStore.onlyAllowedOrganization.isDefined and != user._organization then fail
with FORBIDDEN; change the datasetService.validatePaths line to map failures to
a BAD_REQUEST (use the same pattern as other checks: ?~>
"dataSource.add.pathsNotAllowed" ~> BAD_REQUEST) so path validation returns a
4xx instead of an internal error.
Creation of virtual remote datasets still involves the datastore, because it does the exploring (to handle local file paths).
Steps to test:
Issues: