Skip to content

Conversation

MichaelBuessemeyer
Copy link
Contributor

@MichaelBuessemeyer MichaelBuessemeyer commented Mar 27, 2025

This is a suggestion to help out with vx issue: https://github.com/scalableminds/voxelytics/issues/3894

URL of deployed dev instance (used for testing):

  • https://___.webknossos.xyz

Steps to test:

  • abc

TODOs:

  • Do not allow a dataset id to be passed to /reserveManualUpload
  • instead allow a parameter like allowDuplicateDatasetName to make the request fail in case the organization already has a dataset with that name (and the resulting directoryName would be something like datasetName-datasetId)
  • On success the route should return the new dataset id and directory name.

Issues:


(Please delete unneeded items, merge only when none are left open)

Copy link
Contributor

coderabbitai bot commented Mar 27, 2025

📝 Walkthrough

Walkthrough

This pull request introduces a new parameter, requireUniqueName (also referred to as allow_fail_if_name_taken in some contexts), into various dataset upload and creation flows. The changes affect controllers, services, and models, modifying method signatures and internal logic to enforce unique dataset names, update error messages, and include additional response information such as the new dataset’s ID and directory name. The modifications also extend to JSON structures for upload reservation, ensuring that uniqueness checks are consistently applied throughout the application.

Changes

File(s) Change Summary
CHANGELOG.unreleased.md, conf/messages Updated documentation and added a new error message dataset.upload.creation.failed to indicate dataset creation failures.
app/controllers/WKRemoteDataStoreController.scala,
webknossos-datastore/.../controllers/DataSourceController.scala
Updated reserve and add methods to include the new requireUniqueName parameter; variable renaming to capture upload response details (new dataset ID and directory name) and updated error messages.
app/models/dataset/DatasetService.scala Modified the createPreliminaryDataset method to accept the requireUniqueName parameter and updated its internal logic and error messaging for handling name conflicts.
webknossos-datastore/.../services/uploading/ComposeService.scala,
webknossos-datastore/.../services/uploading/UploadService.scala
Introduced the requireUniqueName parameter in object instantiation and case classes (ReserveUploadInformation and ReserveManualUploadInformation), ensuring proper JSON serialization and upload reservation behavior.

Possibly related PRs

Suggested labels

bug

Poem

I'm a bunny with lines of code so bright,
Hopping through functions day and night.
With new parameters to keep names unique,
I nibble at errors, sleek and mystique.
Cheering in ASCII hops and joyful bytes,
Celebrating changes with delightful rabbit insights!
🐇✨


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@MichaelBuessemeyer MichaelBuessemeyer changed the title Allow passing a manual dataset Id when triggering a manual upload Make manual reserveUpload route return datasetId and directoryName Apr 7, 2025
@MichaelBuessemeyer MichaelBuessemeyer requested a review from fm3 April 7, 2025 11:34
@MichaelBuessemeyer MichaelBuessemeyer marked this pull request as ready for review April 7, 2025 11:34
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
CHANGELOG.unreleased.md (1)

20-20: Great changelog entry!

The entry clearly communicates that you've added a parameter to the reserve manual upload route and enhanced the response with additional information.

Consider changing "returned in the response" to "returned to the response" which is a more common collocation in English.

🧰 Tools
🪛 LanguageTool

[grammar] ~20-~20: The usual collocation for “returned” is “to”, not “in”.
Context: ...new dataset's id and directory name are returned in the response. [#8476](https://github.co...

(RETURN_IN_THE)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bed5824 and 1213fbb.

📒 Files selected for processing (7)
  • CHANGELOG.unreleased.md (1 hunks)
  • app/controllers/WKRemoteDataStoreController.scala (1 hunks)
  • app/models/dataset/DatasetService.scala (1 hunks)
  • conf/messages (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (3 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/ComposeService.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
app/controllers/WKRemoteDataStoreController.scala (1)
app/models/dataset/DatasetService.scala (1)
  • createPreliminaryDataset (68-82)
🪛 LanguageTool
CHANGELOG.unreleased.md

[grammar] ~20-~20: The usual collocation for “returned” is “to”, not “in”.
Context: ...new dataset's id and directory name are returned in the response. [#8476](https://github.co...

(RETURN_IN_THE)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build
🔇 Additional comments (10)
conf/messages (1)

116-116: Good addition of error message

This new error message provides appropriate feedback when dataset creation fails due to name conflicts.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/ComposeService.scala (1)

66-67: Parameter addition is consistent with other changes

Setting requireUniqueName = false as the default for the compose dataset operation maintains backward compatibility while supporting the new functionality.

app/models/dataset/DatasetService.scala (2)

68-71: Method signature update correctly implements new parameter

The addition of the requireUniqueName parameter to the method signature properly supports the new functionality requested in the PR.


74-77: Good implementation of name uniqueness logic

The implementation correctly:

  1. Checks if a dataset with the same name already exists
  2. Fails with an appropriate error message when requireUniqueName is true and the name is taken
  3. Generates a unique directory name when needed by appending the dataset ID
app/controllers/WKRemoteDataStoreController.scala (1)

86-90: Modified createPreliminaryDataset call to enforce unique dataset names

The method now passes the uploadInfo.requireUniqueName parameter to the dataset service, allowing the API to fail predictably when attempting to create a dataset with a name that already exists in the organization. The error message has also been updated to be more general.

This change aligns with the PR objective to make the route fail when an organization already has a dataset with the same name.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)

42-42: Added requireUniqueName parameter to ReserveUploadInformation

This parameter allows clients to specify whether dataset creation should fail if a dataset with the same name already exists in the organization.


47-52: Added requireUniqueName parameter to ReserveManualUploadInformation

The requireUniqueName parameter now has a default value of false, which means manual uploads will, by default, allow duplicate dataset names but can be configured to require unique names when needed.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (3)

116-129: Enhanced reserveManualUpload method to return dataset ID and directory name

The method now captures the reservedDatasetInfo instead of discarding it, and passes request.body.requireUniqueName to enforce unique dataset names when requested. This change aligns with the PR objective to return both the dataset ID and directory name upon successful execution.


131-133: Updated response to include dataset ID and directory name

The response now includes both the new dataset ID and directory name in a JSON object, providing clients with the necessary information to reference the created dataset.


426-426: Set default value for requireUniqueName in add method

The add method now explicitly sets requireUniqueName to false, ensuring consistent behavior with the default value in ReserveManualUploadInformation.

Copy link
Member

@fm3 fm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Could you explain what use case the requireUniqueName serves? My assumption was that duplicate names are no problem, and the returned directoryName will be used by vx to place the dataset on disk.

@MichaelBuessemeyer
Copy link
Contributor Author

MichaelBuessemeyer commented Apr 8, 2025

Could you explain what use case the requireUniqueName serves? My assumption was that duplicate names are no problem, and the returned directoryName will be used by vx to place the dataset on disk.

100'% correct imo. This feature to make the publication task fail in case of a not unique name was requested by @philippotto. Could you please elaborate your reasoning for this @philippotto 🤔 🙏 ?

@MichaelBuessemeyer MichaelBuessemeyer changed the title Make manual reserveUpload route return datasetId and directoryName Make manual reserveManualUpload route return datasetId and directoryName Apr 8, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
CHANGELOG.unreleased.md (1)

20-20: Revise Changelog Wording for Clarity.
The changelog entry currently states "the new dataset's id and directory name are returned in the response." According to common English collocations (as noted by the static analysis hint), consider rephrasing this to "returned to the client" or "returned by the API" for improved clarity and consistency.

🧰 Tools
🪛 LanguageTool

[grammar] ~20-~20: The usual collocation for “returned” is “to”, not “in”.
Context: ...new dataset's id and directory name are returned in the response. [#8476](https://github.co...

(RETURN_IN_THE)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1213fbb and 0f471fa.

📒 Files selected for processing (5)
  • CHANGELOG.unreleased.md (1 hunks)
  • app/controllers/WKRemoteDataStoreController.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (3 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/ComposeService.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/ComposeService.scala
  • app/controllers/WKRemoteDataStoreController.scala
🧰 Additional context used
🪛 LanguageTool
CHANGELOG.unreleased.md

[grammar] ~20-~20: The usual collocation for “returned” is “to”, not “in”.
Context: ...new dataset's id and directory name are returned in the response. [#8476](https://github.co...

(RETURN_IN_THE)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build

@philippotto
Copy link
Member

This feature to make the publication task fail in case of a not unique name was requested by @philippotto. Could you please elaborate your reasoning for this @philippotto 🤔 🙏 ?

When we use voxelytics internally for projects, it is good practice to give a new dataset that is about to be published to WK a unique name (so that it can easily be distinguished from other experiments). This is why I voted for being able to have this asserted, even though WK supports non-unique names.
Related to that: personally, I'm not very fond of the idea that the same folder can have two datasets with the same name. I understand that we had to lift this restriction because of cross-orga and multiple-folder use cases. But I think, we went a bit too far... 🙈

@philippotto
Copy link
Member

philippotto commented Apr 8, 2025

Related to that: personally, I'm not very fond of the idea that the same folder can have two datasets with the same name. I understand that we had to lift this restriction because of cross-orga and multiple-folder use cases. But I think, we went a bit too far... 🙈

google drive supports non-unique file names in the same folder, too. maybe I'm in the minority with my opinion 🤷

Copy link
Member

@fm3 fm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. So this should possibly be an option in the PublishTask in vx as well?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
CHANGELOG.unreleased.md (1)

20-22: Clarify changelog language and add parameter specificity.
The changelog entry currently reads:

  • Added a parameter to the reserve manual upload route allowing to make the request fail if the name is already taken. Moreover, the new dataset's id and directory name are returned in the response. #8476

To improve clarity and precision:

  • Parameter Naming: Consider explicitly stating the name of the new parameter (e.g., allowDuplicateDatasetName or the actual parameter name used) so that users can immediately identify it in their configurations.
  • Grammar Refinement: According to a static analysis hint, the usual collocation for “returned” is “to”, not “in”. You might rephrase the sentence to something like: “... and the new dataset’s id and directory name are returned to the client” or “... are included in the response.”
🧰 Tools
🪛 LanguageTool

[grammar] ~21-~21: The usual collocation for “returned” is “to”, not “in”.
Context: ...new dataset's id and directory name are returned in the response. [#8476](https://github.co...

(RETURN_IN_THE)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0f471fa and a68ef77.

📒 Files selected for processing (2)
  • CHANGELOG.unreleased.md (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
🧰 Additional context used
🪛 LanguageTool
CHANGELOG.unreleased.md

[grammar] ~21-~21: The usual collocation for “returned” is “to”, not “in”.
Context: ...new dataset's id and directory name are returned in the response. [#8476](https://github.co...

(RETURN_IN_THE)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build

@MichaelBuessemeyer MichaelBuessemeyer enabled auto-merge (squash) April 9, 2025 07:56
@MichaelBuessemeyer
Copy link
Contributor Author

So this should possibly be an option in the PublishTask in vx as well?

Yeah, I think that's the idea :D

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
CHANGELOG.unreleased.md (1)

22-22: Changelog wording clarity improvement: Adjust phrase "returned in the response".

The changelog entry accurately documents the update; however, consider rephrasing the clause for clarity. Instead of "returned in the response," which may sound non-standard grammatically, "included in the response" provides a clearer description.

- ... and the new dataset's id and directory name are returned in the response.
+ ... and the new dataset's id and directory name are included in the response.
🧰 Tools
🪛 LanguageTool

[grammar] ~22-~22: The usual collocation for “returned” is “to”, not “in”.
Context: ...new dataset's id and directory name are returned in the response. [#8476](https://github.co...

(RETURN_IN_THE)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a68ef77 and cd99cdf.

📒 Files selected for processing (1)
  • CHANGELOG.unreleased.md (1 hunks)
🧰 Additional context used
🪛 LanguageTool
CHANGELOG.unreleased.md

[grammar] ~22-~22: The usual collocation for “returned” is “to”, not “in”.
Context: ...new dataset's id and directory name are returned in the response. [#8476](https://github.co...

(RETURN_IN_THE)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
CHANGELOG.unreleased.md (1)

23-23: Enhance Clarity in the Changelog Entry

The changelog entry clearly documents the new parameter and updated response information. To improve readability and adhere to conventional phrasing, consider rephrasing "returned in the response" to "returned as part of the response".

- Added a parameter to the reserve manual upload route allowing to make the request fail if the name is already taken. Moreover, the new dataset's id and directory name are returned in the response. [#8476](https://github.com/scalableminds/webknossos/pull/8476)
+ Added a parameter to the reserve manual upload route that causes the request to fail if the name is already taken. Additionally, the new dataset's id and directory name are returned as part of the response. [#8476](https://github.com/scalableminds/webknossos/pull/8476)
🧰 Tools
🪛 LanguageTool

[grammar] ~23-~23: The usual collocation for “returned” is “to”, not “in”.
Context: ...new dataset's id and directory name are returned in the response. [#8476](https://github.co...

(RETURN_IN_THE)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cd99cdf and 9c7a77c.

📒 Files selected for processing (1)
  • CHANGELOG.unreleased.md (1 hunks)
🧰 Additional context used
🪛 LanguageTool
CHANGELOG.unreleased.md

[grammar] ~23-~23: The usual collocation for “returned” is “to”, not “in”.
Context: ...new dataset's id and directory name are returned in the response. [#8476](https://github.co...

(RETURN_IN_THE)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build

@MichaelBuessemeyer MichaelBuessemeyer enabled auto-merge (squash) April 10, 2025 08:02
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
CHANGELOG.unreleased.md (1)

23-23: Nitpick: Rephrase for improved clarity and natural collocation.
The changelog entry currently states, "Moreover, the new dataset's id and directory name are returned in the response." Consider rewording this line to something like:
"Moreover, the response now includes the new dataset's ID and directory name."
This change not only improves clarity but also aligns better with common usage as noted by LanguageTool.

🧰 Tools
🪛 LanguageTool

[grammar] ~23-~23: The usual collocation for “returned” is “to”, not “in”.
Context: ...new dataset's id and directory name are returned in the response. [#8476](https://github.co...

(RETURN_IN_THE)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9c7a77c and 7e2ef8c.

📒 Files selected for processing (1)
  • CHANGELOG.unreleased.md (1 hunks)
🧰 Additional context used
🪛 LanguageTool
CHANGELOG.unreleased.md

[grammar] ~23-~23: The usual collocation for “returned” is “to”, not “in”.
Context: ...new dataset's id and directory name are returned in the response. [#8476](https://github.co...

(RETURN_IN_THE)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build

@MichaelBuessemeyer MichaelBuessemeyer merged commit 772ae17 into master Apr 10, 2025
3 checks passed
@MichaelBuessemeyer MichaelBuessemeyer deleted the allow-passing-ds_id-on-manual-upload branch April 10, 2025 08:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants