-
Notifications
You must be signed in to change notification settings - Fork 29
New reserveUploadToPaths Protocol; Refactor DataLayer Classes; UPath #8844
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughRefactors datastore/layer models to UsableDataSource/StaticLayer with UPath paths; adds upload-to-paths and attachment reservation flows, DB evolutions, many controller/service/DAO signature updates, threads MessagesProvider through annotation/task APIs, and bumps API to v11. Changes
Estimated code review effort🎯 5 (Critical) | ⏱️ ~180 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (4)
app/models/dataset/DatasetService.scala (4)
280-289
: Consider clearing thumbnail cache for consistencyOther dataset update flows clear the thumbnail cache before updating. This path should do the same to ensure thumbnails are regenerated with the new data.
Apply this diff to clear thumbnails before updating:
_ <- if (isChanged) { logger.info(s"Updating dataSource of $datasetId") for { + _ <- thumbnailCachingService.removeFromCache(datasetId) _ <- Fox.runIf(!dataset.isVirtual)(dataStoreClient.updateDataSourceOnDisk(datasetId, updatedDataSource)) _ <- dataStoreClient.invalidateDatasetInDSCache(datasetId) _ <- datasetDAO.updateDataSource(datasetId, dataset._dataStore, updatedDataSource.hashCode(), updatedDataSource, isUsable = true)(GlobalAccessContext) } yield ()
407-416
: Optimize database query for unusable datasetsThe current implementation always queries
datasetDataLayerDAO.findAllForDataset
even when the dataset is not usable, which wastes a database query.Apply this optimization to avoid unnecessary DB queries:
def dataSourceFor(dataset: Dataset): Fox[DataSource] = { val dataSourceId = DataSourceId(dataset.directoryName, dataset._organization) - if (dataset.isUsable) + if (!dataset.isUsable) + Fox.successful(UnusableDataSource(dataSourceId, None, dataset.status, dataset.voxelSize)) + else for { voxelSize <- dataset.voxelSize.toFox ?~> "dataset.source.usableButNoVoxelSize" dataLayers <- datasetDataLayerDAO.findAllForDataset(dataset._id) } yield UsableDataSource(dataSourceId, dataLayers, voxelSize) - else - Fox.successful(UnusableDataSource(dataSourceId, None, dataset.status, dataset.voxelSize)) }
295-301
: Critical: Layers are dropped when not present in updatesThe
flatMap
inapplyDataSourceUpdates
will remove any existing layer not found in the updates. This is dangerous for partial updates where the client might only send layers they want to modify.Apply this diff to preserve layers not included in the update:
- val updatedLayers = existingDataSource.dataLayers.flatMap { existingLayer => - val layerUpdatesOpt = updates.dataLayers.find(_.name == existingLayer.name) - layerUpdatesOpt match { - case Some(layerUpdates) => Some(applyLayerUpdates(existingLayer, layerUpdates)) - case None => None - } - } + val updatedLayers = existingDataSource.dataLayers.map { existingLayer => + updates.dataLayers.find(_.name == existingLayer.name) match { + case Some(layerUpdates) => applyLayerUpdates(existingLayer, layerUpdates) + case None => existingLayer + } + }
270-290
: String interpolation will fail compilationLine 290 uses
f
interpolator without format specifiers which won't compile.Apply this diff:
- } else Fox.successful(logger.info(f"DataSource $datasetId not updated as the hashCode is the same")) + } else Fox.successful(logger.info(s"DataSource $datasetId not updated as the hashCode is the same"))
🧹 Nitpick comments (2)
app/models/dataset/DataStore.scala (1)
152-159
: Fix inconsistent error message in finder methodThe
findOneWithUploadsToPathsAllowed
method returns "find one with uploads allowed" as the error message, which doesn't accurately describe what this method does. It should indicate it's looking for uploads to paths specifically.Apply this diff to fix the error message:
- parsed <- parseFirst(r, "find one with uploads allowed") + parsed <- parseFirst(r, "find one with uploads to paths allowed")app/models/dataset/DatasetService.scala (1)
69-73
: Redundant dataset name uniqueness checkThe method
assertNewDatasetNameUnique
appears to duplicate the functionality ofcheckNameAvailable
(lines 75-79). Both methods check if a dataset name exists in an organization and return an error if it does.Consider removing
assertNewDatasetNameUnique
and usingcheckNameAvailable
consistently throughout the codebase to avoid duplication:- def assertNewDatasetNameUnique(name: String, organizationId: String): Fox[Unit] = - for { - exists <- datasetDAO.doesDatasetNameExistInOrganization(name, organizationId) - _ <- Fox.fromBool(!exists) ?~> "dataset.name.taken" - } yield () - def checkNameAvailable(organizationId: String, datasetName: String): Fox[Unit] =
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
app/controllers/DataStoreController.scala
(2 hunks)app/models/dataset/DataStore.scala
(7 hunks)app/models/dataset/DatasetService.scala
(18 hunks)app/models/dataset/DatasetUploadToPathsService.scala
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- app/models/dataset/DatasetUploadToPathsService.scala
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-04-28T14:18:04.368Z
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/NeuroglancerPrecomputedMeshFileService.scala:161-166
Timestamp: 2025-04-28T14:18:04.368Z
Learning: In Scala for-comprehensions with the Fox error handling monad, `Fox.fromBool()` expressions should use the `<-` binding operator instead of the `=` assignment operator to properly propagate error conditions. Using `=` will cause validation failures to be silently ignored.
Applied to files:
app/models/dataset/DatasetService.scala
📚 Learning: 2025-05-12T13:07:29.637Z
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.
Applied to files:
app/models/dataset/DatasetService.scala
🧬 Code graph analysis (3)
app/models/dataset/DataStore.scala (2)
app/utils/sql/SecuredSQLDAO.scala (2)
readAccessQuery
(23-36)existingCollectionName
(16-16)app/utils/sql/SqlInterpolation.scala (2)
q
(20-39)asUpdate
(74-74)
app/controllers/DataStoreController.scala (3)
app/models/dataset/DataStore.scala (3)
DataStore
(20-31)DataStore
(33-63)fromForm
(36-56)util/src/main/scala/com/scalableminds/util/tools/Fox.scala (3)
Fox
(30-223)Fox
(225-298)fromBool
(32-36)app/models/analytics/AnalyticsService.scala (1)
isSuperUser
(103-108)
app/models/dataset/DatasetService.scala (7)
util/src/main/scala/com/scalableminds/util/tools/JsonHelper.scala (2)
JsonHelper
(16-112)removeKeyRecursively
(100-112)util/src/main/scala/com/scalableminds/util/tools/TextUtils.scala (2)
TextUtils
(6-75)normalizeStrong
(50-56)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (4)
UPath
(54-96)fromString
(59-59)toString
(106-109)toString
(174-174)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (6)
DataSource
(31-43)UnusableDataSource
(45-62)UnusableDataSource
(64-66)UsableDataSource
(68-107)UsableDataSource
(109-113)isUsable
(20-20)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSourceId.scala (3)
DataSourceId
(5-7)DataSourceId
(9-26)toString
(6-6)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSourceStatus.scala (1)
DataSourceStatus
(3-11)app/models/dataset/WKRemoteDataStoreClient.scala (3)
updateDataSourceOnDisk
(116-123)invalidateDatasetInDSCache
(109-116)validatePaths
(104-109)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-smoketest-push
- GitHub Check: frontend-tests
- GitHub Check: backend-tests
🔇 Additional comments (8)
app/models/dataset/DatasetService.scala (7)
283-283
: Cache invalidation effect not executedThe datastore cache invalidation uses
<-
binding correctly, so the effect will be executed.
559-563
: Directory name generation is robustThe new
generateDirectoryName
method properly handles edge cases where normalization returnsNone
by falling back to the dataset ID. This ensures every dataset gets a unique directory name.
593-593
: Good security practice: credentials properly removed from public JSONThe use of
JsonHelper.removeKeyRecursively
to strip outcredentialId
andcredentials
from the dataSource JSON is a good security practice. This ensures sensitive authentication information is never exposed in API responses.
401-405
: Clear error messaging for unusable datasetsThe
usableDataSourceFor
method provides clear error messages when datasets are not imported, helping users understand why a dataset cannot be used.
81-89
: Consistent use of DataSourceStatus constantsGood use of the new
DataSourceStatus.notYetUploaded
constant instead of hardcoded strings, improving maintainability.
102-103
: Verify unique directory names across the systemAppending the dataset ID makes names unique but breaks the previous “plain name” behavior. Occurrences of directoryName found in:
./webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/LegacyController.scala
./webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
./test/backend/DataSourceTestSuite.scala
./app/models/dataset/DatasetService.scala
./app/models/job/JobService.scala
./app/models/job/Job.scala
./app/models/annotation/AnnotationService.scala
./app/controllers/UserTokenController.scala
./app/controllers/WKRemoteDataStoreController.scala
./app/controllers/DatasetController.scalaInspect/update these callers and tests to accept the new directory format and run the full test suite.
308-369
: Add unit tests for applyLayerUpdates (Color ↔ Segmentation conversions)File: app/models/dataset/DatasetService.scala (lines 308–369)
applyLayerUpdates performs type conversions and selective field updates — add/unit-test the conversion matrix and edge cases:
- Color -> Color: assert only boundingBox, coordinateTransformations, defaultViewConfiguration, adminViewConfiguration are replaced; name, dataFormat, elementClass, mags, additionalAxes, attachments remain unchanged.
- Color -> Segmentation: assert resulting StaticSegmentationLayer uses existing layer’s name, dataFormat, elementClass, mags, additionalAxes, attachments and takes boundingBox, coordinateTransformations, defaultViewConfiguration, adminViewConfiguration, largestSegmentId from the update; verify the final segmentation-only constructor field is set to None.
- Segmentation -> Segmentation: assert boundingBox, coordinateTransformations, defaultViewConfiguration, adminViewConfiguration, largestSegmentId are updated.
- Segmentation -> Color: assert resulting StaticColorLayer preserves existing name/dataFormat/elementClass/mags/additionalAxes/attachments and takes boundingBox/default/adminViewConfiguration/coordinateTransformations from the update; largestSegmentId must be dropped.
- Edge cases: test largestSegmentId = None vs Some(value), empty vs non-empty attachments/additionalAxes, and null/absent coordinate transformations.
Repo search produced no matching tests for these conversions — add unit tests (and an integration test exercising the public API path that calls applyLayerUpdates, if available).
app/controllers/DataStoreController.scala (1)
48-49
: Fix authorization flow issue withFox.fromBool
The current code uses
=
forFox.fromBool
which doesn't properly sequence the effect. This will cause the authorization check to be evaluated but not properly fail when the user is not a super user.Apply this diff to fix the authorization flow:
- multiUser <- multiUserDAO.findOne(request.identity._multiUser) - _ <- Fox.fromBool(multiUser.isSuperUser) ?~> "notAllowed" ~> FORBIDDEN + multiUser <- multiUserDAO.findOne(request.identity._multiUser) + _ <- Fox.fromBool(multiUser.isSuperUser) ?~> "notAllowed" ~> FORBIDDENWait, I see the code already uses
<-
binding. Let me re-examine...Actually, the code is correct as shown. The
<-
operator is properly used for both lines.
Is there an open TODO for me? You need an approval from my side correct? Is there anything that needs re-reviewing / testing as you made some changes? |
Yes, the latest changes are:
I’d be happy if we could merge this in the coming week. Maybe you could have another quick look at the latest changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I reviewed the new commits again:
- Some messages still need to be added to
conf/messages
- I do not fully understand the new workflow for uploading a dataset to S3 🙈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
frontend/javascripts/admin/rest_api.ts (2)
993-1000
: Avoid emitting a trailing “?” when no params are set.If both
sharingToken
andincludePaths
are unset, the URL becomes/api/datasets/{id}?
. Small but noisy for logs/caches.Apply:
const params = new URLSearchParams(); if (sharingToken != null) { params.set("sharingToken", String(sharingToken)); } if (includePaths != null) { params.set("includePaths", String(includePaths)); } - return Request.receiveJSON(`/api/datasets/${datasetId}?${params}`, options); + const qs = params.toString(); + const url = qs ? `/api/datasets/${datasetId}?${qs}` : `/api/datasets/${datasetId}`; + return Request.receiveJSON(url, options);
1018-1027
: ScopedataSource
updates to server-allowed fields (or confirm backend ignores others).Type allows full
APIDataSource
. If backend only permits a subset, consider a narrowedAllowedDataSourceUpdates
type to prevent accidental over-posting; otherwise confirm server-side whitelisting.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
frontend/javascripts/admin/rest_api.ts
(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
frontend/javascripts/admin/rest_api.ts (2)
frontend/javascripts/libs/request.ts (1)
RequestOptions
(31-31)frontend/javascripts/types/api_types.ts (2)
APIDataset
(243-246)APIDataSource
(152-152)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-smoketest-push
- GitHub Check: backend-tests
- GitHub Check: frontend-tests
🔇 Additional comments (2)
frontend/javascripts/admin/rest_api.ts (2)
1015-1015
: Delegation LGTM.
getDatasetLegacy
forcingincludePaths = true
matches legacy behavior intent.
987-992
: Breaking signature change — verify call sites or add overloads.getDataset's params were reordered to (datasetId, includePaths?, sharingToken?, options?). Callers that pass a string as the 2nd argument (old sharingToken position) will break. Automated scan couldn't be completed in this environment — run a repo-wide search for getDataset call sites and update callers, or add TypeScript overloads to accept both shapes: (id, sharingToken?, options?) and (id, includePaths?, sharingToken?, options?). File: frontend/javascripts/admin/rest_api.ts (≈lines 987–992).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Thanks a lot for your hard work!
Let's give it a go 🟢
Based on and thus blocked by #8844 ### Steps to test: - refresh local database schema - start up, log in, should see l4_sample_remote in the dashboard dataset list - should be usable. ### Issues - contributes to #8813 (we can easily add a few more once we have them publicly statically hosted somewhere. Ideal would be something with plenty of attachments and something with a different data type, like n5 or neuroglancerPrecomputed) ------ - [x] Added changelog entry (create a `$PR_NUMBER.md` file in `unreleased_changes` or use `./tools/create-changelog-entry.py`) - [x] Considered [common edge cases](../blob/master/.github/common_edge_cases.md) --------- Co-authored-by: valentin-pinkau <[email protected]>
#8897) If the mag has a relative path, it must be resolved in the dataset root path. Based on and thus blocked by #8844 ### URL of deployed dev instance (used for testing): - https://fixexplorepaths.webknossos.xyz ### Steps to test: - Visit the served zarr-streaming datasource-properties.json at `http://localhost:9000/data/zarr/<datasetId>/datasource-properties.json?token=secretSampleUserToken` - Should list relative paths - Explore the dataset in the same webknossos `http://localhost:9000/data/zarr/<datasetId>/` and add it (self-stream) - Should be explorable and afterwards show data. - Same for annotation zarr links (note: explore the whole annotation, not just one layer) ### Issues: - fixes #8811 ------ - [x] Added changelog entry (create a `$PR_NUMBER.md` file in `unreleased_changes` or use `./tools/create-changelog-entry.py`) - [x] Considered [common edge cases](../blob/master/.github/common_edge_cases.md) - [x] Needs datastore update after deployment --------- Co-authored-by: valentin-pinkau <[email protected]>
1. This PR removes the advanced dataset settings mode (JSON input). PR #8844 removed the ability to edit JSON directly anyway - this is the frontend follow-up / clean-up. This affects both the views for dataset settings, as well as, remote dataset upload/import. 2. PR #8844 added a `includePaths` GET parameter to `/api/datasets`. The matching backend changes were never merged to `master` and discarded. 3. Small refinements: - Added a background color for the upload datasets tabs for better contrast. - Added a developer-only button to view the raw API response for datasets to inspect paths, mags etc, now that the JSON view is gone. <img width="1635" height="737" alt="Screenshot 2025-09-25 at 11 09 01" src="https://github.com/user-attachments/assets/0242b0ca-6764-48ca-bf5f-dca16c19e3fc" /> ### Steps to test: 1. Edit an existing dataset. Make changes, save and double-check they save correctly. 2. Upload a new remote dataset, e.g. from https://docs.webknossos.org/webknossos/data/neuroglancer_precomputed.html . Check that you can add one or more layers without issue. ### Issues: - fixes #8942 - fixes #5571 - fixes #5639 ------ (Please delete unneeded items, merge only when none are left open) - [x] Added changelog entry (create a `$PR_NUMBER.md` file in `unreleased_changes` or use `./tools/create-changelog-entry.py`) - [ ] Added migration guide entry if applicable (edit the same file as for the changelog) - [x] Updated [documentation](../blob/master/docs) if applicable - [ ] Adapted [wk-libs python client](https://github.com/scalableminds/webknossos-libs/tree/master/webknossos/webknossos/client) if relevant API parts change - [ ] Removed dev-only changes like prints and application.conf edits - [x] Considered [common edge cases](../blob/master/.github/common_edge_cases.md) - [ ] Needs datastore update after deployment --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Philipp Otto <[email protected]>
Description
This PR changes how clients with direct access to the underlying data storage can get datasets registered into WK.
Corresponding libs client PR: scalableminds/webknossos-libs#1359
New ReserveDatasetUploadToPaths Protocol
New ReserveAttachmentUploadToPath Protocol
New ReserveUploadToPathForPreliminary Protocol
This extra route supports a new workflow, which will be used only by the convert_to_wkw worker job, which webknossos starts when a user uploads a dataset that is not yet in wkw/zarr format. When starting this upload, a dataset is already inserted in the database (via reserveUpload – not manual). Then after finishUpload, the conversion worker job is started. Now after the worker has converted the dataset to wkw/zarr, it needs to call this new reserveUploadToPathsForPreliminary route, which works only on datasets in this particular state. It returns paths, just like the normal, user-facing reserveUploadToPaths described above. But it is different in that it needs to operate on the datasetId of the already-created, preliminary dataset from the original reserveUpload.Changes to Editing DataSources
Misc Changes
Refactoring: UPath
Refactoring: New DataLayer Class Hierarchy
Steps to test:
Test against regressions
Test new features
webKnossos.datasets.uploadToPathsPrefixes
to our managed s3 object storage, give both wk and the client credentials, should also work with this.TODOs:
Backend
Frontend
Follow-Ups
Possible Follow-Up Issues
Issues:
$PR_NUMBER.md
file inunreleased_changes
or use./tools/create-changelog-entry.py
)