Store dataset properties in DB #8609

frcroth · 2025-05-12T09:34:36Z

URL of deployed dev instance (used for testing):

https://___.webknossos.xyz

This PR aims to have all properties that may occur in the datasource-properties.json file of a dataset mirrored in the DB. The things that were missing in the DB are:

layer/numChannels
layer/dataFormat
wkwResolution/cubeLength -> stored in mag table
mag/axisOrder
mag/channelIndex
mag/credentialId (mag also supports legacy credentials, which I did not move to the DB here. We already have credentials in the db which are referenced via credentialId, it would not make sense to create another credential type)

AxisOrder is done with a string in the form of "x:4,y:3,z:2", it would also be possible to store this in a new table and avoid the string serialization.

Steps to test:

Check out these changes on existing datasets -> no errors and data in db

TODOs:

...

Issues:

contributes to: Virtual datasets #8384

(Please delete unneeded items, merge only when none are left open)

Updated changelog
Updated migration guide if applicable
Updated documentation if applicable
Adapted wk-libs python client if relevant API parts change
Removed dev-only changes like prints and application.conf edits
Considered common edge cases
Needs datastore update after deployment

coderabbitai · 2025-05-12T09:34:43Z

📝 Walkthrough

Walkthrough

This update introduces new metadata fields to dataset layers and magnifications, reflected in both the database schema and the application logic. The migration scripts add and remove these fields as needed. The Scala models and DAOs are extended to support and persist the new properties. A duplicate route is removed from the routing configuration, and migration documentation is updated accordingly.

Changes

Files/Paths	Change Summary
conf/evolutions/133-datasource-properties-in-db.sql conf/evolutions/reversions/133-datasource-properties-in-db.sql tools/postgres/schema.sql	Database schema updated to version 133: new enum type `DATASET_LAYER_DATAFORMAT` added; new columns added to `dataset_layers` (`numChannels`, `dataFormat`) and `dataset_mags` (`credentialId`, `axisOrder` with check constraint, `channelIndex`, `cubeLength`). Migration and reversion scripts handle these schema changes and schema versioning.
app/models/dataset/Dataset.scala	DAO methods updated to handle new metadata fields for dataset magnifications and layers. Insert/update queries now support `dataFormat`, `numChannels`, and additional mag attributes (`credentialId`, `axisOrder`, `channelIndex`, `cubeLength`). Logic extended to conditionally insert richer metadata depending on presence of `magsOpt` or `wkwResolutionsOpt`.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala	Data layer traits and case classes extended with optional fields: `mags`, `dataFormat`, `numChannels`, and `wkwResolutions`. New accessor methods added to `DataLayerLike` and `DataLayerWithMagLocators` traits. Companion objects updated to populate new fields from input layers.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/AxisOrder.scala	Removed trailing blank line inside `AxisOrder` case class; no functional changes.
conf/webknossos.latest.routes	Duplicate POST route for `/maintenances` removed from routing configuration.
MIGRATIONS.unreleased.md	Migration documentation updated to reference new Postgres evolution script for datasource properties in the database.

Possibly related PRs

Avoid SQL error in case of duplicate mag in datasource-properties #8339: Modifies the same updateMags method in DatasetMagsDAO to handle distinct resolutions and avoid SQL errors, closely related to this PR’s extension of that method for richer metadata insertion.

Suggested labels

refactoring

Suggested reviewers

fm3

Poem

🥕
New fields sprout in tables, like carrots in spring,
Axis orders now sing with a stringy new ring.
Mags and formats, channels galore—
Our data’s more detailed than ever before!
With routes trimmed neat and docs up to date,
This bunny hops on—database looking great!

Note

⚡️ AI Code Reviews for VS Code, Cursor, Windsurf

CodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback.
Learn more here.

Note

⚡️ Faster reviews with caching

CodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure Review - Disable Cache at either the organization or repository level. If you prefer to disable all data retention across your organization, simply turn off the Data Retention setting under your Organization Settings.
Enjoy the performance boost—your workflow just got faster.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ef841ea and 636aa8a.

📒 Files selected for processing (5)

app/models/dataset/Dataset.scala (2 hunks)
conf/evolutions/133-datasource-properties-in-db.sql (1 hunks)
conf/webknossos.latest.routes (0 hunks)
tools/postgres/schema.sql (4 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/AxisOrder.scala (0 hunks)

💤 Files with no reviewable changes (2)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/AxisOrder.scala
conf/webknossos.latest.routes

🚧 Files skipped from review as they are similar to previous changes (3)

tools/postgres/schema.sql
conf/evolutions/133-datasource-properties-in-db.sql
app/models/dataset/Dataset.scala

⏰ Context from checks skipped due to timeout of 90000ms (2)

GitHub Check: backend-tests
GitHub Check: build-smoketest-push

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

conf/webknossos.latest.routes

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (3)

app/models/dataset/Dataset.scala (1)

936-950: Keep INSERT / UPDATE column ordering consistent

The two branches diverge in column order (dataFormat, numChannels) which invites copy-paste errors and makes diffing harder.
Align both statements (or extract a helper) so future changes touch only one place.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (2)
453-464: getMags duplicates logic already expressed in magsOpt

getMags hard-codes the same exhaustive match list and throws at runtime. Prefer:
def getMags: List[MagLocator] =
  magsOpt.getOrElse(
    throw new IllegalStateException(s"Layer $name does not expose mags")
  )
This keeps the enumeration in a single location.

482-487: Case-class field proliferation risks hitting the 22-field limit

AbstractDataLayer now sits at 13 fields, AbstractSegmentationLayer at 15.
While still below the 22-field product limit, planned future extensions (e.g. compression, tiling strategy, provenance) will break compilation.

Start thinking about grouping related settings into small value objects (e.g. StorageInfo, DisplayInfo) instead of adding more primitive fields.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dd43621 and 28144e2.

⛔ Files ignored due to path filters (2)

test/db/dataSet_layers.csv is excluded by !**/*.csv
test/db/dataSet_mags.csv is excluded by !**/*.csv

📒 Files selected for processing (8)

MIGRATIONS.unreleased.md (1 hunks)
app/models/dataset/Dataset.scala (2 hunks)
conf/evolutions/133-datasource-properties-in-db.sql (1 hunks)
conf/evolutions/reversions/133-datasource-properties-in-db.sql (1 hunks)
conf/webknossos.latest.routes (0 hunks)
tools/postgres/schema.sql (3 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/AxisOrder.scala (2 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (7 hunks)

💤 Files with no reviewable changes (1)

conf/webknossos.latest.routes

🧰 Additional context used

🧬 Code Graph Analysis (1)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (6)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/WKWDataLayers.scala (3)

WKWResolution (12-12)

WKWResolution (14-16)

mags (29-29)

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (2)

dataFormat (85-85)

additionalAxes (105-105)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/N5DataLayers.scala (1)

numChannels (25-25)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/PrecomputedDataLayers.scala (1)

numChannels (25-25)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/Zarr3DataLayers.scala (1)

numChannels (25-25)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/ZarrDataLayers.scala (1)

numChannels (23-23)

🔇 Additional comments (7)

MIGRATIONS.unreleased.md (1)

15-15: LGTM: Added new migration script to the unreleased list.

The newly added entry for the datasource properties migration script follows the proper format and is correctly placed in the list.

tools/postgres/schema.sql (3)

24-24: Correct schema version increment.

Schema version is properly incremented to 133 to match the new migration script.

139-140: LGTM: Added new columns to dataset_layers table.

The added columns match the PR objective, adding numChannels and dataFormat to the dataset layers table.

177-180: LGTM: Added new columns to dataset_mags table.

The added columns match the PR objective, adding axisOrder, channelIndex, cubeLength, and credentialId to store additional metadata in the database.

conf/evolutions/reversions/133-datasource-properties-in-db.sql (1)

1-17: LGTM: Proper rollback script for the migration.

The rollback script correctly:

Verifies the current schema version

Drops the newly added columns from both tables

Downgrades the schema version

Wraps operations in a transaction

Using DROP COLUMN IF EXISTS is good practice for maintaining idempotence.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/AxisOrder.scala (2)

24-37: Well-implemented toString method for serialization.

The toString implementation appropriately handles optional fields and creates a clean, consistent string representation of the axis order.

63-70: LGTM: Added proper deserialization logic.

The fromString method correctly parses the string representation back into an AxisOrder instance, handling optional fields appropriately.

conf/evolutions/133-datasource-properties-in-db.sql

app/models/dataset/Dataset.scala

coderabbitai · 2025-05-12T13:00:53Z

...ossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala

+  // Datasets that are not in the WKW format use mags
+  def magsOpt: Option[List[MagLocator]] = this match {
+    case layer: AbstractDataLayer         => layer.mags
+    case layer: AbstractSegmentationLayer => layer.mags
+    case layer: DataLayerWithMagLocators  => Some(layer.getMags)
+    case _                                => None
+  }
+
+  def dataFormatOpt: Option[DataFormat.Value] = this match {
+    case layer: WKWDataLayer                 => Some(layer.dataFormat)
+    case layer: WKWSegmentationLayer         => Some(layer.dataFormat)
+    case layer: ZarrDataLayer                => Some(layer.dataFormat)
+    case layer: ZarrSegmentationLayer        => Some(layer.dataFormat)
+    case layer: N5DataLayer                  => Some(layer.dataFormat)
+    case layer: N5SegmentationLayer          => Some(layer.dataFormat)
+    case layer: PrecomputedDataLayer         => Some(layer.dataFormat)
+    case layer: PrecomputedSegmentationLayer => Some(layer.dataFormat)
+    case layer: Zarr3DataLayer               => Some(layer.dataFormat)
+    case layer: Zarr3SegmentationLayer       => Some(layer.dataFormat)
+    // Abstract layers
+    case _ => None
+  }
+
+  def numChannelsOpt: Option[Int] = this match {
+    case layer: AbstractDataLayer            => layer.numChannels
+    case layer: AbstractSegmentationLayer    => layer.numChannels
+    case layer: ZarrDataLayer                => layer.numChannels
+    case layer: ZarrSegmentationLayer        => layer.numChannels
+    case layer: N5DataLayer                  => layer.numChannels
+    case layer: N5SegmentationLayer          => layer.numChannels
+    case layer: PrecomputedDataLayer         => layer.numChannels
+    case layer: PrecomputedSegmentationLayer => layer.numChannels
+    case layer: Zarr3DataLayer               => layer.numChannels
+    case layer: Zarr3SegmentationLayer       => layer.numChannels
+    case _                                   => None
+  }
+
+  def wkwResolutionsOpt: Option[List[WKWResolution]] = this match {
+    case layer: AbstractDataLayer         => layer.wkwResolutions
+    case layer: AbstractSegmentationLayer => layer.wkwResolutions
+    case layer: WKWDataLayer              => Some(layer.wkwResolutions)
+    case layer: WKWSegmentationLayer      => Some(layer.wkwResolutions)
+    case _                                => None
+  }
+


🛠️ Refactor suggestion

Pattern-matching helper grows maintenance overhead

magsOpt, dataFormatOpt, numChannelsOpt, and wkwResolutionsOpt individually enumerate every concrete layer type.
Each new layer implementation will require four additions here – easy to forget and will compile but return None, causing subtle bugs.

Two alternatives:

Push the responsibility down: add abstract def magsOpt: Option[List[MagLocator]] etc. to DataLayerLike with sensible default None, and override in the relevant sub-types.

Keep the helper but replace exhaustive matching with a structural test, e.g.

this match { case l: { def mags: List[MagLocator] } => Some(l.mags) case _ => None }

(uses structural types / asInstanceOf – trade-offs apply.)

Reducing the repetition makes the codebase safer and easier to extend.

MichaelBuessemeyer

Well, couldn't find any issue here. Well done 🎉 (testing also worked out well 👍)

Note: I did not double check whether your list regarding the missing properties in the DB that needed to be added is complete.

IMO this should be mergable. @fm3 What do you think?

MichaelBuessemeyer · 2025-05-13T14:56:27Z

app/models/dataset/Dataset.scala

              mappings = $mappings,
-              defaultViewConfiguration = ${s.defaultViewConfiguration.map(Json.toJson(_))}""".asUpdate
+              defaultViewConfiguration = ${s.defaultViewConfiguration.map(Json.toJson(_))},
+              adminViewConfiguration = ${s.adminViewConfiguration.map(Json.toJson(_))},


thanks for also adding the missing adminViewConfiguration update

fm3

Looking pretty good! I added a few small comments. If you had a specific reason against using json for the AxisOrder, let me know :)

fm3 · 2025-05-15T07:51:37Z

tools/postgres/schema.sql


 CREATE TYPE webknossos.DATASET_LAYER_CATEGORY AS ENUM ('color', 'mask', 'segmentation');
 CREATE TYPE webknossos.DATASET_LAYER_ELEMENT_CLASS AS ENUM ('uint8', 'uint16', 'uint24', 'uint32', 'uint64', 'float', 'double', 'int8', 'int16', 'int32', 'int64');
+CREATE TYPE webknossos.DATASET_LAYER_DATAFORMAT AS ENUM ('wkw','zarr','zarr3','n5','neuroglancerPrecomputed','tracing');


I think tracing should not happen for dataset layers, can be removed here

fm3 · 2025-05-15T07:52:16Z

tools/postgres/schema.sql

  path TEXT,
  realPath TEXT,
  hasLocalData BOOLEAN NOT NULL DEFAULT FALSE,
+  axisOrder TEXT CONSTRAINT axisOrder_format CHECK (axisOrder ~ '^[xyzc]:[0-9]+(,[xyzc]:[0-9]+)+$'),


TBH I’m not a huge fan of the custom axisOrder literal. How about using a jsonb column?

fm3 · 2025-05-15T07:55:08Z

...ossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala

+    case layer: WKWDataLayer              => Some(layer.wkwResolutions)
+    case layer: WKWSegmentationLayer      => Some(layer.wkwResolutions)
+    case _                                => None
+  }


The woes of case classes… Unfortunately I don’t know how to compact this further.

This could be compacted further by having a common trait which can be matched to. But not sure whether introducing another trait to the whole data layer hierarchy would be ideal 🤔

side note: That would be one advantage of scala 3 (still not voting for migrating to scala 3 though)

frcroth added 3 commits May 12, 2025 11:19

Store dataset properties in DB

35205cc

Merge branch 'master' into dataset-properties-in-db

e2b00de

Bump evolution number

8f19277

Update MIGRATIONS

5200450

frcroth force-pushed the dataset-properties-in-db branch from 3344f87 to a86896b Compare May 12, 2025 12:26

Update test db

d6231e4

frcroth force-pushed the dataset-properties-in-db branch from a86896b to d6231e4 Compare May 12, 2025 12:39

Merge branch 'master' into dataset-properties-in-db

28144e2

frcroth commented May 12, 2025

View reviewed changes

conf/webknossos.latest.routes Show resolved Hide resolved

frcroth marked this pull request as ready for review May 12, 2025 12:49

coderabbitai bot reviewed May 12, 2025

View reviewed changes

Datalayer format as enum

ef841ea

frcroth requested review from fm3 and MichaelBuessemeyer May 12, 2025 14:18

MichaelBuessemeyer assigned frcroth May 13, 2025

MichaelBuessemeyer added enhancement backend labels May 13, 2025

MichaelBuessemeyer approved these changes May 13, 2025

View reviewed changes

fm3 reviewed May 15, 2025

View reviewed changes

frcroth and others added 2 commits May 19, 2025 09:59

Use JSON in db instead of string serialization

b3a31d5

Merge branch 'master' into dataset-properties-in-db

636aa8a

frcroth requested a review from fm3 May 19, 2025 07:59

fm3 approved these changes May 19, 2025

View reviewed changes

frcroth merged commit 3d041ae into master May 19, 2025
5 checks passed

frcroth deleted the dataset-properties-in-db branch May 19, 2025 08:15

frcroth mentioned this pull request May 19, 2025

Fix dataset deletion error because of wrong SQL parsing #8639

Merged

5 tasks

coderabbitai bot mentioned this pull request May 26, 2025

Store layer-specific attachments #8598

Merged

8 tasks

coderabbitai bot mentioned this pull request Jun 23, 2025

Virtual Remote Datasets #8657

Merged

8 tasks

coderabbitai bot mentioned this pull request Jul 21, 2025

Virtual Datasets #8708

Merged

22 tasks

This was referenced Jul 24, 2025

Fix integer overflow in zarr chunk index computation #8798

Merged

Fix format in volume annotation download zarr.json #8800

Merged

When determining mag realpaths, treat schemeless uris as local #8804

Merged

coderabbitai bot mentioned this pull request Sep 3, 2025

New reserveUploadToPaths Protocol; Refactor DataLayer Classes; UPath #8844

Merged

35 tasks

This was referenced Oct 1, 2025

Calculate s3 storage #8789

Merged

Fix deduplication of mags/attachments to scan for storage #8981

Merged

Store dataset properties in DB #8609

Store dataset properties in DB #8609

Uh oh!

Conversation

frcroth commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

URL of deployed dev instance (used for testing):

Steps to test:

TODOs:

Issues:

Uh oh!

coderabbitai bot commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot May 12, 2025

Choose a reason for hiding this comment

Uh oh!

MichaelBuessemeyer left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MichaelBuessemeyer May 13, 2025

Choose a reason for hiding this comment

Uh oh!

fm3 left a comment

Choose a reason for hiding this comment

Uh oh!

fm3 May 15, 2025

Choose a reason for hiding this comment

Uh oh!

fm3 May 15, 2025

Choose a reason for hiding this comment

Uh oh!

fm3 May 15, 2025

Choose a reason for hiding this comment

Uh oh!

MichaelBuessemeyer May 15, 2025

Choose a reason for hiding this comment

Uh oh!

MichaelBuessemeyer May 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

frcroth commented May 12, 2025 •

edited

Loading

coderabbitai bot commented May 12, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

MichaelBuessemeyer left a comment •

edited

Loading