Skip to content

Conversation

frcroth
Copy link
Contributor

@frcroth frcroth commented Aug 4, 2025

#8708 caused SQL errors because the db does not match the schema, so the order of columns was different, leading to a parsing issue for the results of an SQL query that queried all columns.

Copy link
Contributor

coderabbitai bot commented Aug 4, 2025

📝 Walkthrough

Walkthrough

The SQL query in the findAllByDatasetId method of the DatasetMagsDAO class was updated to explicitly specify selected columns rather than using SELECT *. No changes were made to method signatures or the overall logic of the method.

Changes

Cohort / File(s) Change Summary
DatasetMagsDAO Query Update
app/models/dataset/Dataset.scala
Modified the SQL query in findAllByDatasetId to select specific columns instead of using SELECT *.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Possibly related PRs

Suggested labels

bug, backend

Suggested reviewers

  • fm3

Poem

A hop through the code, a nibble so neat,
SQL made clear with columns discrete.
No wildcards to chase, just fields well defined,
The data retrieved, precisely aligned.
🐇✨
Review with a smile—this change is refined!

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch virtual-datasets-fix-mag-sql-error

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@frcroth frcroth marked this pull request as ready for review August 4, 2025 11:40
@frcroth frcroth requested a review from fm3 August 4, 2025 11:42
Copy link
Member

@fm3 fm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works for me, thanks!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 664ce02 and ba5e78a.

📒 Files selected for processing (1)
  • app/models/dataset/Dataset.scala (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.
📚 Learning: in the `updatemags` method of datasetmagsdao (scala), the code handles different dataset types disti...
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.

Applied to files:

  • app/models/dataset/Dataset.scala
📚 Learning: the database schema in webknossos has separate tables for dataset layers (`dataset_layers`) and magn...
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: conf/evolutions/133-datasource-properties-in-db.sql:8-16
Timestamp: 2025-05-12T14:15:05.259Z
Learning: The database schema in WEBKNOSSOS has separate tables for dataset layers (`dataset_layers`) and magnifications (`dataset_mags`). The `dataFormat` field is stored in the layers table while magnification-specific fields like `cubeLength` (specific to WKW format) are stored in the mags table.

Applied to files:

  • app/models/dataset/Dataset.scala
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: backend-tests
  • GitHub Check: build-smoketest-push

Comment on lines +864 to +866
rows <- run(
q"""SELECT _dataset, dataLayerName, mag, path, realPath, hasLocalData, axisOrder, channelIndex, credentialId
FROM webknossos.dataset_mags WHERE _dataset = $datasetId""".as[DatasetMagsRow])
Copy link
Contributor

@coderabbitai coderabbitai bot Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Fragile manual column list – consider dataset_mags.* or a dedicated GetResult

The manual projection fixes the SELECT * ambiguity, but it tightly couples this query to both
a) the exact column order expected by DatasetMagsRow’s auto-generated GetResult, and
b) the current schema version.
Any new column added to webknossos.dataset_mags (or even a re-ordering in the DDL) will make the query return fewer/ wrongly-ordered columns than DatasetMagsRow expects, resulting in a runtime SQLException: Invalid column index.

Two alternatives that avoid this fragility while still being explicit:

- q"""SELECT _dataset, dataLayerName, mag, path, realPath, hasLocalData, axisOrder, channelIndex, credentialId
-    FROM webknossos.dataset_mags WHERE _dataset = $datasetId""".as[DatasetMagsRow]
+ q"""SELECT dataset_mags.*            -- keeps order in sync with table definition
+    FROM webknossos.dataset_mags
+    WHERE _dataset = $datasetId""".as[DatasetMagsRow]

or keep the reduced projection, but introduce a bespoke GetResult/case-class that lists only the needed columns (_dataset, dataLayerName, mag, path, axisOrder, channelIndex, credentialId).
Either choice removes the silent-break risk and clarifies the contract between SQL and Scala mapping.


🏁 Script executed:

#!/bin/bash
# Locate the definition of DatasetMagsRow to verify expected columns
rg -n "case class DatasetMagsRow" -n app/models/dataset/Dataset.scala

Length of output: 71


🏁 Script executed:

#!/bin/bash
# Search for definition of DatasetMagsRow and its GetResult mapping
rg -n "DatasetMagsRow" -n .
rg -n "GetResult" -n app/models/dataset/Dataset.scala

Length of output: 655


🏁 Script executed:

#!/bin/bash
# Locate the evolution file defining dataset_mags and print its schema
evo_file=$(rg -l "CREATE TABLE.*dataset_mags" -n conf/evolutions)
echo "Evolution file: $evo_file"
sed -n '/CREATE TABLE.*dataset_mags/,/);/p' "$evo_file"

Length of output: 204


🏁 Script executed:

#!/bin/bash
# Show the DatasetMagsRow case class and related parsing context
echo "------ Dataset.scala [700-820] ------"
sed -n '700,820p' app/models/dataset/Dataset.scala

# List SQL evolution files and inspect the one mentioned in retrieved learnings
echo "------ SQL evolution files in conf/evolutions ------"
fd -e sql conf/evolutions
echo "------ conf/evolutions/133-datasource-properties-in-db.sql [1-200] ------"
sed -n '1,200p' conf/evolutions/133-datasource-properties-in-db.sql

Length of output: 7026


🏁 Script executed:

#!/bin/bash
# Show more context around the Slick table definition for dataset_mags
echo "------ Dataset.scala [600-740] ------"
sed -n '600,740p' app/models/dataset/Dataset.scala

Length of output: 6813


Avoid brittle manual column projection in dataset_mags query

The explicit column list here couples the SQL to the exact column order and schema version. Any future addition (e.g. cubeLength) or DDL reordering will silently break the GetResult[DatasetMagsRow] mapping and lead to runtime errors.

Two safer approaches:

  • Switch to selecting all columns in table order:
    - q"""SELECT _dataset, dataLayerName, mag, path, realPath, hasLocalData, axisOrder, channelIndex, credentialId
    -     FROM webknossos.dataset_mags WHERE _dataset = $datasetId""".as[DatasetMagsRow]
    + q"""SELECT dataset_mags.*
    +     FROM webknossos.dataset_mags
    +     WHERE _dataset = $datasetId""".as[DatasetMagsRow]
  • If you only need a subset of fields, introduce a bespoke case class + GetResult listing exactly those columns.

Either option removes the risk of broken column-order mappings and makes the SQL↔Scala contract explicit.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
rows <- run(
q"""SELECT _dataset, dataLayerName, mag, path, realPath, hasLocalData, axisOrder, channelIndex, credentialId
FROM webknossos.dataset_mags WHERE _dataset = $datasetId""".as[DatasetMagsRow])
rows <- run(
q"""SELECT dataset_mags.*
FROM webknossos.dataset_mags
WHERE _dataset = $datasetId""".as[DatasetMagsRow])
🤖 Prompt for AI Agents
In app/models/dataset/Dataset.scala around lines 864 to 866, the SQL query
explicitly lists columns which risks breaking the mapping if the schema changes.
To fix this, either select all columns from dataset_mags without specifying them
individually, or define a new case class with a corresponding GetResult instance
for the exact subset of columns needed. This ensures the SQL-to-Scala mapping
remains correct and resilient to schema changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You got it backwards. This is exactly what we want to avoid, since the production DB is not guaranteed to have the columns in the correct order. If the column order in the DB does not match the one in the schema, parsing the rows to the generated row objects fails. This was the case here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@frcroth Thank you for the correction! You're absolutely right - I had it backwards.

The explicit column listing ensures that the query returns columns in the exact order expected by the DatasetMagsRow case class, regardless of the physical column order in the production database. Using SELECT * or dataset_mags.* would return columns in whatever order they exist physically in the table, which could break the auto-generated GetResult mapping if the physical order doesn't match the schema definition.

This makes perfect sense as a fix for SQL query issues, especially when dealing with databases where column ordering might vary between environments.

🐰💡
(\ /)
( .. )
o
(")(")


✏️ Learnings added
Learnt from: frcroth
PR: scalableminds/webknossos#8821
File: app/models/dataset/Dataset.scala:864-866
Timestamp: 2025-08-04T11:49:29.951Z
Learning: In WebKnossos Scala codebase, when querying database tables with Slick, explicit column listing in SELECT statements is preferred over SELECT * to ensure columns are returned in the exact order expected by case class mappings. This prevents parsing failures when the physical column order in the production database doesn't match the schema definition order.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.

Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: conf/evolutions/133-datasource-properties-in-db.sql:8-16
Timestamp: 2025-05-12T14:15:05.259Z
Learning: The database schema in WEBKNOSSOS has separate tables for dataset layers (`dataset_layers`) and magnifications (`dataset_mags`). The `dataFormat` field is stored in the layers table while magnification-specific fields like `cubeLength` (specific to WKW format) are stored in the mags table.

Learnt from: dieknolle3333
PR: scalableminds/webknossos#8168
File: frontend/javascripts/oxalis/model/sagas/proofread_saga.ts:1039-1039
Timestamp: 2024-11-22T17:18:04.217Z
Learning: In `frontend/javascripts/oxalis/model/sagas/proofread_saga.ts`, when calling `getMagInfo`, the use of `volumeTracingLayer.resolutions` is intentional and should not be changed to `volumeTracingLayer.mags`.

@frcroth frcroth merged commit 27948cb into master Aug 4, 2025
5 checks passed
@frcroth frcroth deleted the virtual-datasets-fix-mag-sql-error branch August 4, 2025 11:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants