Fix SQL query issues introduced by #8708 #8821

frcroth · 2025-08-04T11:34:46Z

#8708 caused SQL errors because the db does not match the schema, so the order of columns was different, leading to a parsing issue for the results of an SQL query that queried all columns.

coderabbitai · 2025-08-04T11:34:52Z

📝 Walkthrough

Walkthrough

The SQL query in the findAllByDatasetId method of the DatasetMagsDAO class was updated to explicitly specify selected columns rather than using SELECT *. No changes were made to method signatures or the overall logic of the method.

Changes

Cohort / File(s)	Change Summary
DatasetMagsDAO Query Update `app/models/dataset/Dataset.scala`	Modified the SQL query in `findAllByDatasetId` to select specific columns instead of using `SELECT *`.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Possibly related PRs

Fix dataset deletion error because of wrong SQL parsing #8639: Both PRs modify SQL queries in the DatasetMagsDAO class to explicitly select columns instead of using implicit or variable-based column selection, addressing issues related to SQL parsing and data retrieval.

Suggested labels

bug, backend

Suggested reviewers

fm3

Poem

A hop through the code, a nibble so neat,
SQL made clear with columns discrete.
No wildcards to chase, just fields well defined,
The data retrieved, precisely aligned.
🐇✨
Review with a smile—this change is refined!

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch virtual-datasets-fix-mag-sql-error

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

fm3

works for me, thanks!

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 664ce02 and ba5e78a.

📒 Files selected for processing (1)

app/models/dataset/Dataset.scala (1 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.

📚 Learning: in the `updatemags` method of datasetmagsdao (scala), the code handles different dataset types disti...

Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.

Applied to files:

app/models/dataset/Dataset.scala

📚 Learning: the database schema in webknossos has separate tables for dataset layers (`dataset_layers`) and magn...

Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: conf/evolutions/133-datasource-properties-in-db.sql:8-16
Timestamp: 2025-05-12T14:15:05.259Z
Learning: The database schema in WEBKNOSSOS has separate tables for dataset layers (`dataset_layers`) and magnifications (`dataset_mags`). The `dataFormat` field is stored in the layers table while magnification-specific fields like `cubeLength` (specific to WKW format) are stored in the mags table.

Applied to files:

app/models/dataset/Dataset.scala

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: backend-tests
GitHub Check: build-smoketest-push

coderabbitai · 2025-08-04T11:45:46Z

app/models/dataset/Dataset.scala

+      rows <- run(
+        q"""SELECT _dataset, dataLayerName, mag, path, realPath, hasLocalData, axisOrder, channelIndex, credentialId
+           FROM webknossos.dataset_mags WHERE _dataset = $datasetId""".as[DatasetMagsRow])


💡 Verification agent

🧩 Analysis chain

Fragile manual column list – consider dataset_mags.* or a dedicated GetResult

The manual projection fixes the SELECT * ambiguity, but it tightly couples this query to both
a) the exact column order expected by DatasetMagsRow’s auto-generated GetResult, and
b) the current schema version.
Any new column added to webknossos.dataset_mags (or even a re-ordering in the DDL) will make the query return fewer/ wrongly-ordered columns than DatasetMagsRow expects, resulting in a runtime SQLException: Invalid column index.

Two alternatives that avoid this fragility while still being explicit:

- q"""SELECT _dataset, dataLayerName, mag, path, realPath, hasLocalData, axisOrder, channelIndex, credentialId - FROM webknossos.dataset_mags WHERE _dataset = $datasetId""".as[DatasetMagsRow] + q"""SELECT dataset_mags.* -- keeps order in sync with table definition + FROM webknossos.dataset_mags + WHERE _dataset = $datasetId""".as[DatasetMagsRow]

or keep the reduced projection, but introduce a bespoke GetResult/case-class that lists only the needed columns (_dataset, dataLayerName, mag, path, axisOrder, channelIndex, credentialId).
Either choice removes the silent-break risk and clarifies the contract between SQL and Scala mapping.

🏁 Script executed:

#!/bin/bash # Locate the definition of DatasetMagsRow to verify expected columns rg -n "case class DatasetMagsRow" -n app/models/dataset/Dataset.scala

Length of output: 71

🏁 Script executed:

#!/bin/bash # Search for definition of DatasetMagsRow and its GetResult mapping rg -n "DatasetMagsRow" -n . rg -n "GetResult" -n app/models/dataset/Dataset.scala

Length of output: 655

🏁 Script executed:

#!/bin/bash # Locate the evolution file defining dataset_mags and print its schema evo_file=$(rg -l "CREATE TABLE.*dataset_mags" -n conf/evolutions) echo "Evolution file: $evo_file" sed -n '/CREATE TABLE.*dataset_mags/,/);/p' "$evo_file"

Length of output: 204

🏁 Script executed:

#!/bin/bash # Show the DatasetMagsRow case class and related parsing context echo "------ Dataset.scala [700-820] ------" sed -n '700,820p' app/models/dataset/Dataset.scala # List SQL evolution files and inspect the one mentioned in retrieved learnings echo "------ SQL evolution files in conf/evolutions ------" fd -e sql conf/evolutions echo "------ conf/evolutions/133-datasource-properties-in-db.sql [1-200] ------" sed -n '1,200p' conf/evolutions/133-datasource-properties-in-db.sql

Length of output: 7026

🏁 Script executed:

#!/bin/bash # Show more context around the Slick table definition for dataset_mags echo "------ Dataset.scala [600-740] ------" sed -n '600,740p' app/models/dataset/Dataset.scala

Length of output: 6813

Avoid brittle manual column projection in dataset_mags query

The explicit column list here couples the SQL to the exact column order and schema version. Any future addition (e.g. cubeLength) or DDL reordering will silently break the GetResult[DatasetMagsRow] mapping and lead to runtime errors.

Two safer approaches:

Switch to selecting all columns in table order:
- q"""SELECT _dataset, dataLayerName, mag, path, realPath, hasLocalData, axisOrder, channelIndex, credentialId - FROM webknossos.dataset_mags WHERE _dataset = $datasetId""".as[DatasetMagsRow] + q"""SELECT dataset_mags.* + FROM webknossos.dataset_mags + WHERE _dataset = $datasetId""".as[DatasetMagsRow]

If you only need a subset of fields, introduce a bespoke case class + GetResult listing exactly those columns.

Either option removes the risk of broken column-order mappings and makes the SQL↔Scala contract explicit.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

rows <- run(

q"""SELECT _dataset, dataLayerName, mag, path, realPath, hasLocalData, axisOrder, channelIndex, credentialId

FROM webknossos.dataset_mags WHERE _dataset = $datasetId""".as[DatasetMagsRow])

rows <- run(

q"""SELECT dataset_mags.*

FROM webknossos.dataset_mags

WHERE _dataset = $datasetId""".as[DatasetMagsRow])

🤖 Prompt for AI Agents

In app/models/dataset/Dataset.scala around lines 864 to 866, the SQL query explicitly lists columns which risks breaking the mapping if the schema changes. To fix this, either select all columns from dataset_mags without specifying them individually, or define a new case class with a corresponding GetResult instance for the exact subset of columns needed. This ensures the SQL-to-Scala mapping remains correct and resilient to schema changes.

You got it backwards. This is exactly what we want to avoid, since the production DB is not guaranteed to have the columns in the correct order. If the column order in the DB does not match the one in the schema, parsing the rows to the generated row objects fails. This was the case here.

@frcroth Thank you for the correction! You're absolutely right - I had it backwards.

The explicit column listing ensures that the query returns columns in the exact order expected by the DatasetMagsRow case class, regardless of the physical column order in the production database. Using SELECT * or dataset_mags.* would return columns in whatever order they exist physically in the table, which could break the auto-generated GetResult mapping if the physical order doesn't match the schema definition.

This makes perfect sense as a fix for SQL query issues, especially when dealing with databases where column ordering might vary between environments.

🐰💡
(\ /)
( .. )
o(")(")

✏️ Learnings added

Learnt from: frcroth PR: scalableminds/webknossos#8821 File: app/models/dataset/Dataset.scala:864-866 Timestamp: 2025-08-04T11:49:29.951Z Learning: In WebKnossos Scala codebase, when querying database tables with Slick, explicit column listing in SELECT statements is preferred over SELECT * to ensure columns are returned in the exact order expected by case class mappings. This prevents parsing failures when the physical column order in the production database doesn't match the schema definition order.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: frcroth PR: scalableminds/webknossos#8609 File: app/models/dataset/Dataset.scala:753-775 Timestamp: 2025-05-12T13:07:29.637Z Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly: 1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId. 2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength. 3. The final branch is a fallback for legacy data. This ensures appropriate fields are populated for each dataset type.

Learnt from: frcroth PR: scalableminds/webknossos#8609 File: conf/evolutions/133-datasource-properties-in-db.sql:8-16 Timestamp: 2025-05-12T14:15:05.259Z Learning: The database schema in WEBKNOSSOS has separate tables for dataset layers (`dataset_layers`) and magnifications (`dataset_mags`). The `dataFormat` field is stored in the layers table while magnification-specific fields like `cubeLength` (specific to WKW format) are stored in the mags table.

Learnt from: dieknolle3333 PR: scalableminds/webknossos#8168 File: frontend/javascripts/oxalis/model/sagas/proofread_saga.ts:1039-1039 Timestamp: 2024-11-22T17:18:04.217Z Learning: In `frontend/javascripts/oxalis/model/sagas/proofread_saga.ts`, when calling `getMagInfo`, the use of `volumeTracingLayer.resolutions` is intentional and should not be changed to `volumeTracingLayer.mags`.

Use specific select, rather than select all columns

14eb32f

Remove comment relating to old usage

ba5e78a

frcroth marked this pull request as ready for review August 4, 2025 11:40

frcroth requested a review from fm3 August 4, 2025 11:42

fm3 approved these changes Aug 4, 2025

View reviewed changes

coderabbitai bot reviewed Aug 4, 2025

View reviewed changes

frcroth merged commit 27948cb into master Aug 4, 2025
5 checks passed

frcroth deleted the virtual-datasets-fix-mag-sql-error branch August 4, 2025 11:46

coderabbitai bot mentioned this pull request Aug 21, 2025

Disallow 'SELECT *' statements in scala code #8867

Merged

1 task

coderabbitai bot mentioned this pull request Oct 7, 2025

Fix deduplication of mags/attachments to scan for storage #8981

Merged

3 tasks

Fix SQL query issues introduced by #8708 #8821

Fix SQL query issues introduced by #8708 #8821

Uh oh!

Conversation

frcroth commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

fm3 left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

frcroth Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

frcroth commented Aug 4, 2025 •

edited

Loading

coderabbitai bot commented Aug 4, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

coderabbitai bot Aug 4, 2025 •

edited

Loading