Skip to content

Conversation

frcroth
Copy link
Contributor

@frcroth frcroth commented Jan 22, 2025

URL of deployed dev instance (used for testing):

  • https://___.webknossos.xyz

Steps to test:

  • Create ds with symlinks in mag, ds with symlinks in layer, remote datasets, ...
  • Check DB or GET /datasets/:datasetId/layers/:layer/linkedMags

Overview

In the datastore we now scan for mags and symlinks. Symlinks are resolved for layer directories and mag directories. The result is send to WK via a new route (subject to discussion).
In WK the database stores per mag:

  • path: The path (stored as URI) that wk uses to read the data
  • realPath: The path (stored as URI) where the data actually resides
  • hasLocalData: Is the data in the realPath local to the dataset.

So that

  • For normal datasets, realPath = path and hasLocalData = true
  • For remote datasets, realPath = path and hasLocalData = false with the URIs also indicating the remote scheme
  • For symlinked mags and layers, realPath != path and hasLocalData = false
  • For locally explored datasets realPath = path (as there is no layer directory and WK directly uses the path property in the mag) and hasLocalData = false

Using hasLocalData and realPath we will be able to handle deletions.

I added a route to get a list of mags that use a mags path, this is for testing purposes and can be deleted before merge.

TODOs:

  • Test symlinked mags
  • Test mags with path as file://
  • Store in DB / Find a way to differentiate a ds with a mag that has a file path and the ds which has the data
  • Use toRealPath?
  • When removing directory from local directory whitelist an error occurs and no more paths are added to the db

Issues:


(Please delete unneeded items, merge only when none are left open)

@frcroth frcroth self-assigned this Jan 22, 2025
Copy link
Contributor

coderabbitai bot commented Jan 22, 2025

📝 Walkthrough

Walkthrough

This pull request introduces comprehensive changes to dataset path management and metadata handling in the WEBKNOSSOS application. The modifications span multiple components, including controllers, services, and database schemas. Key additions include new methods for retrieving and updating dataset paths, introducing case classes to represent dataset magnification information, and extending the database schema to support real paths and local data tracking. The changes enhance the system's ability to manage and track dataset metadata more precisely.

Changes

File Change Summary
app/controllers/DatasetController.scala Added MagLinkInfo case class and implicit JSON format for serialization.
app/controllers/WKRemoteDataStoreController.scala Added updatePaths method to update data source paths and updated import statements.
app/models/dataset/Dataset.scala Introduced DatasetMagInfo and MagWithPaths case classes, updated DatasetMagsDAO methods for managing dataset magnitude paths, and changed class inheritance from SimpleSQLDAO to SQLDAO.
app/models/dataset/DatasetService.scala Added methods updateRealPath, updateRealPaths, and getPathsForDataLayer, and updated constructor to include datasetMagsDAO.
conf/evolutions/126-mag-real-paths.sql Added columns realPath, path, and hasLocalData to dataset_mags table.
conf/evolutions/reversions/126-mag-real-paths.sql Dropped columns realPath, path, and hasLocalData from dataset_mags table.
conf/webknossos.latest.routes Added new route for updating paths in the datastore.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala Added DataSourcePathInfo and MagPathInfo case classes and reportRealPaths method.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceRepository.scala Added publishRealPaths method for reporting paths.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala Added methods for handling magnetic data sources and reporting paths.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RemoteSourceDescriptorService.scala Added resolveMagPath method for URI resolution and updated existing methods.

Suggested labels

backend, enhancement, new feature

Poem

🐰 Paths of data, now so clear,
Magnifications dancing near,
Real paths mapped with gentle care,
Dataset secrets we now share,
A rabbit's code, precise and bright! 🔍


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@frcroth frcroth changed the title wip resolve symbolic links to mags and store them in db Resolve symbolic links to mags and store them in db Jan 29, 2025
@frcroth frcroth marked this pull request as ready for review January 29, 2025 09:17
@frcroth frcroth requested a review from fm3 January 29, 2025 09:17
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🔭 Outside diff range comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RemoteSourceDescriptorService.scala (1)

Line range hint 49-64: Consider using canonical paths and typed exceptions for security and clarity.

While this logic properly checks remote vs. local schemes, relying on startsWith for whitelisting can be susceptible to path traversal (e.g., embedded "../"). Additionally, throwing raw Exception can obscure the root cause. Use a canonical path check (e.g., toRealPath) and a specialized exception type to enhance security and clarity.

🧹 Nitpick comments (13)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (3)

91-105: Enhance error logging for better debugging.

While the error handling is good, the error messages could be more descriptive to aid in debugging.

Consider adding more context to error messages:

-          logger.error(s"Failed to determine real paths for mags of $ds: $msg")
+          logger.error(s"Failed to determine real paths for mags of dataset ${ds.id}: $msg. This may affect path resolution.")
-          logger.error(s"Failed to determine real paths for mags of $ds")
+          logger.error(s"Failed to determine real paths for mags of dataset ${ds.id}. This may affect path resolution.")

153-158: Add input validation and documentation.

The method would benefit from input validation and clear documentation of its purpose and behavior.

Consider these improvements:

+  /**
+   * Resolves a potentially relative path against a base path.
+   * If the relative path is already absolute, returns it unchanged.
+   * Otherwise, resolves it against the base path and normalizes the result.
+   *
+   * @param basePath The base path to resolve against
+   * @param relativePath The path to resolve
+   * @return The resolved absolute path
+   * @throws NullPointerException if either path is null
+   */
   private def resolveRelativePath(basePath: Path, relativePath: Path): Path = {
+    require(basePath != null, "basePath cannot be null")
+    require(relativePath != null, "relativePath cannot be null")
     if (relativePath.isAbsolute) {
       relativePath
     } else {
       basePath.resolve(relativePath).normalize().toAbsolutePath
     }
   }

91-158: Consider implementing a dedicated PathResolver service.

The new path resolution functionality is well-implemented but could benefit from being extracted into a dedicated service. This would:

  1. Improve separation of concerns
  2. Make the path resolution logic more testable
  3. Reduce the complexity of the DataSourceService class
  4. Make it easier to modify path resolution strategies in the future

Consider creating a new PathResolverService class to handle all path resolution logic, including symlink resolution and URI generation. This would make the codebase more maintainable and modular.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RemoteSourceDescriptorService.scala (2)

77-83: Check for invalid local directories and handle them explicitly.

If neither localDirWithScalarMag nor localDirWithVec3Mag exists, the code silently defaults to localDirWithVec3Mag. Updating the logic or adding a warning/error for missing directories can avoid unintentional fallbacks and help debug user misconfiguration.


86-98: Consolidate path resolution logic in a single place.

uriForMagLocator calls resolveMagPath and then applies an additional toAbsolutePath.toUri if not remote. Consider directly returning an absolute path from resolveMagPath for consistency, or unifying their logic to avoid duplication and reduce potential path resolution discrepancies.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceRepository.scala (1)

49-53: Add error handling and logging around real path reporting.

This method simply calls reportRealPaths without any additional handling for potential failures or partial updates. Consider adding logging or structured error handling to ensure visibility into path reporting issues.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)

40-50: Document or validate new path-related fields.

The new case classes DataSourcePathInfo and MagPathInfo introduce important fields (e.g., layerName, mag, path, realPath, hasLocalData). Brief documentation or validation logic would help ensure correct usage across the codebase, especially for mag or path fields.

app/controllers/WKRemoteDataStoreController.scala (1)

224-238: Consider adding request logging and error details.

The method implementation is correct but could benefit from additional logging to aid in debugging:

  1. Log the number of path infos being processed
  2. Include more details in the warning message for invalid JSON

Apply this diff to enhance logging:

 def updatePaths(name: String, key: String): Action[JsValue] = Action.async(parse.json) { implicit request =>
   dataStoreService.validateAccess(name, key) { _ =>
     request.body.validate[List[DataSourcePathInfo]] match {
       case JsSuccess(infos, _) =>
+        logger.info(s"Updating paths for ${infos.length} data sources from data store '$name'")
         for {
           _ <- datasetService.updateRealPaths(infos)(GlobalAccessContext)
         } yield {
           JsonOk
         }
       case e: JsError =>
-        logger.warn("Data store reported invalid json for data source paths.")
+        logger.warn(s"Data store '$name' reported invalid json for data source paths: ${JsError.toJson(e)}")
         Fox.successful(JsonBadRequest(JsError.toJson(e)))
     }
   }
 }
app/models/dataset/DatasetService.scala (1)

343-356: Consider adding error logging for dataset not found case.

The error handling is correct, but adding logging would help track issues in production.

Apply this diff to enhance error logging:

 private def updateRealPath(pathInfo: DataSourcePathInfo)(implicit ctx: DBAccessContext): Fox[Unit] =
   if (pathInfo.magPathInfos.isEmpty) {
     Fox.successful(())
   } else {
     val dataset = datasetDAO.findOneByDataSourceId(pathInfo.dataSourceId).futureBox
     dataset.flatMap {
       case Full(dataset) => datasetMagsDAO.updateMagPathsForDataset(dataset._id, pathInfo.magPathInfos)
       case Empty => // Dataset reported but ignored (non-existing/forbidden org)
+        logger.debug(s"Ignoring path update for non-existing dataset: ${pathInfo.dataSourceId}")
         Fox.successful(())
       case e: EmptyBox =>
+        logger.warn(s"Failed to find dataset: ${pathInfo.dataSourceId}", e)
         Fox.failure("dataset.notFound", e)
     }
   }
app/controllers/DatasetController.scala (1)

137-149: Consider adding pagination for large datasets.

The method correctly implements the required functionality but might need pagination for datasets with many mags.

Apply this diff to add pagination:

-def getLinkedMags(datasetId: String, dataLayerName: String): Action[AnyContent] =
+def getLinkedMags(datasetId: String, dataLayerName: String, limit: Option[Int], offset: Option[Int]): Action[AnyContent] =
   sil.SecuredAction.async { implicit request =>
     for {
       datasetIdValidated <- ObjectId.fromString(datasetId)
       _ <- datasetDAO.findOne(datasetIdValidated) ?~> notFoundMessage(datasetId) ~> NOT_FOUND
       _ <- Fox.bool2Fox(request.identity.isAdmin) ?~> "notAllowed" ~> FORBIDDEN
       magsAndLinkedMags <- datasetService.getPathsForDatalayer(datasetIdValidated, dataLayerName)
-      returnValues = magsAndLinkedMags.map {
+      returnValues = magsAndLinkedMags.drop(offset.getOrElse(0)).take(limit.getOrElse(Int.MaxValue)).map {
         case (mag, linkedMags) => MagLinkInfo(mag, linkedMags)
       }
     } yield Ok(Json.toJson(returnValues))
   }
conf/webknossos.latest.routes (1)

94-94: Update route documentation.

The routes are correctly defined but should be documented in the API documentation.

Add comments above each route to describe their purpose:

+# Get linked magnifications for a dataset layer
 GET           /datasets/:datasetId/layers/:layer/linkedMags                                     controllers.DatasetController.getLinkedMags(datasetId: String, layer: String)

+# Update data source paths in the datastore
 PUT           /datastores/:name/datasources/paths                                               controllers.WKRemoteDataStoreController.updatePaths(name: String, key: String)

Also applies to: 112-112

app/models/dataset/Dataset.scala (1)

767-784: Consider improving the handling of uninitialized paths.

Instead of using a magic string "uninitialized" for null paths, consider:

  1. Using an Option type to represent potentially uninitialized paths
  2. Handling the null case at the application level where the context is clearer
-                         row.path.getOrElse("uninitialized"),
-                         row.realpath.getOrElse("uninitialized"),
+                         row.path.getOrElse(""),
+                         row.realpath.getOrElse(""),
conf/evolutions/126-mag-real-paths.sql (1)

1-12: Well-structured SQL evolution with proper safeguards!

The SQL evolution:

  • Uses transaction for atomicity
  • Includes schema version check
  • Adds columns with appropriate types
  • Sets sensible default for hasLocalData

However, consider adding NOT NULL constraints to path and realPath columns since they are required fields in the DatasetMagInfo case class.

-ALTER TABLE webknossos.dataset_mags ADD COLUMN realPath TEXT;
-ALTER TABLE webknossos.dataset_mags ADD COLUMN path TEXT;
+ALTER TABLE webknossos.dataset_mags ADD COLUMN realPath TEXT NOT NULL DEFAULT '';
+ALTER TABLE webknossos.dataset_mags ADD COLUMN path TEXT NOT NULL DEFAULT '';
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fd97cd5 and d839176.

📒 Files selected for processing (13)
  • MIGRATIONS.unreleased.md (1 hunks)
  • app/controllers/DatasetController.scala (2 hunks)
  • app/controllers/WKRemoteDataStoreController.scala (2 hunks)
  • app/models/dataset/Dataset.scala (4 hunks)
  • app/models/dataset/DatasetService.scala (3 hunks)
  • conf/evolutions/126-mag-real-paths.sql (1 hunks)
  • conf/evolutions/reversions/126-mag-real-paths.sql (1 hunks)
  • conf/webknossos.latest.routes (2 hunks)
  • tools/postgres/schema.sql (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (3 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceRepository.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (3 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RemoteSourceDescriptorService.scala (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build
🔇 Additional comments (16)
conf/evolutions/reversions/126-mag-real-paths.sql (3)

1-3: LGTM! Transaction and schema version check are properly implemented.

The transaction block ensures atomicity, and the schema version assertion provides a critical safety check against incorrect migrations.


9-11: LGTM! Schema version update and transaction commit are correct.

The schema version is properly decremented to 125, and the transaction is committed to ensure all changes are applied atomically.


5-7: Verify the impact of dropping these columns.

The column drops align with the PR objectives for managing symlinked mags. However, ensure that:

  1. Any existing data in these columns has been properly migrated or is safe to discard
  2. All application code references to these columns are removed
  3. The corresponding forward migration (126-mag-real-paths.sql) correctly creates these columns
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (3)

11-11: LGTM: Required imports added for new functionality.

The new imports for MagLocator and DataVaultService align with the PR's objective to handle mag symlinks.

Also applies to: 15-15


81-81: LGTM: Path resolution integrated into inbox scanning.

The addition of sendRealPaths call ensures that real paths are processed and published whenever inbox sources are scanned.


143-151: LGTM: Clean and focused implementation.

The method follows the single responsibility principle and clearly handles mag URI resolution.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RemoteSourceDescriptorService.scala (1)

65-70: Raise an error if neither dataset nor layer path exists.

Currently, if magPathRelativeToDataset does not exist, the code falls back to magPathRelativeToLayer. Consider raising an error or logging a warning if neither path is found, ensuring graceful handling of invalid file structures.

app/controllers/WKRemoteDataStoreController.scala (1)

10-10: LGTM!

The import statement correctly includes the new DataSourcePathInfo type needed for the path update functionality.

app/models/dataset/DatasetService.scala (2)

17-17: LGTM!

The imports are correctly organized and include all necessary types for the new functionality.

Also applies to: 23-23


357-361: LGTM!

The updateRealPaths method correctly serializes updates to maintain consistency.

app/controllers/DatasetController.scala (1)

74-78: LGTM!

The case class and its JSON format are correctly defined.

app/models/dataset/Dataset.scala (3)

24-24: LGTM!

The import for MagPathInfo is correctly added to support the new mag path handling functionality.


712-721: Well-structured data model for mag information!

The DatasetMagInfo case class and its companion object are well-designed:

  • Encapsulates all necessary fields for mag path information
  • Provides JSON serialization support
  • Aligns with the PR objectives for storing symbolic link information

748-765: Robust implementation with proper transaction handling!

The implementation includes:

  • Proper SQL escaping and type handling
  • Transaction with serializable isolation
  • Retry logic for handling serialization failures
  • Batch updates for efficiency
MIGRATIONS.unreleased.md (1)

21-22: LGTM!

The migration guide is correctly updated to include the new SQL evolution.

tools/postgres/schema.sql (1)

23-23: LGTM!

The schema changes:

  • Correctly update the schema version
  • Add the new columns matching the SQL evolution
  • Set appropriate default for hasLocalData

Also applies to: 181-183

Comment on lines 107 to 141
private def determineMagRealPathsForDataSource(dataSource: InboxDataSource) = tryo {
val organizationPath = dataBaseDir.resolve(dataSource.id.organizationId)
val datasetPath = organizationPath.resolve(dataSource.id.directoryName)
dataSource.toUsable match {
case Some(usableDataSource) => {
usableDataSource.dataLayers.flatMap { dataLayer =>
val rawLayerPath = datasetPath.resolve(dataLayer.name)
val absoluteLayerPath = if (Files.isSymbolicLink(rawLayerPath)) {
resolveRelativePath(datasetPath, Files.readSymbolicLink(rawLayerPath))
} else {
rawLayerPath.toAbsolutePath
}
dataLayer.mags.map { mag =>
val (magURI, isRemote) = getMagURI(datasetPath, absoluteLayerPath, mag)
if (isRemote) {
MagPathInfo(dataLayer.name, mag.mag, magURI.toString, magURI.toString, hasLocalData = false)
} else {
val magPath = Paths.get(magURI)
val realPath = magPath.toRealPath()
// Does this dataset have local data, i.e. the data that is referenced by the mag path is within the dataset directory
val isLocal = realPath.startsWith(datasetPath.toAbsolutePath)
val unresolvedPath =
rawLayerPath.toAbsolutePath.resolve(absoluteLayerPath.relativize(magPath)).normalize()
MagPathInfo(dataLayer.name,
mag.mag,
unresolvedPath.toUri.toString,
realPath.toUri.toString,
hasLocalData = isLocal)
}
}
}
}
case None => List()
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for symlink resolution and improve code structure.

The method handles complex path resolution logic but has some areas for improvement:

  1. Missing error handling for symlink resolution
  2. Complex nested structure could be simplified
  3. Missing documentation for the business logic

Consider these improvements:

   private def determineMagRealPathsForDataSource(dataSource: InboxDataSource) = tryo {
+    // Resolves paths for each mag in the data source, handling both local and remote data
+    // Returns a list of MagPathInfo containing the original and resolved paths
     val organizationPath = dataBaseDir.resolve(dataSource.id.organizationId)
     val datasetPath = organizationPath.resolve(dataSource.id.directoryName)
     dataSource.toUsable match {
       case Some(usableDataSource) => {
         usableDataSource.dataLayers.flatMap { dataLayer =>
           val rawLayerPath = datasetPath.resolve(dataLayer.name)
-          val absoluteLayerPath = if (Files.isSymbolicLink(rawLayerPath)) {
-            resolveRelativePath(datasetPath, Files.readSymbolicLink(rawLayerPath))
-          } else {
-            rawLayerPath.toAbsolutePath
-          }
+          val absoluteLayerPath = try {
+            if (Files.isSymbolicLink(rawLayerPath)) {
+              resolveRelativePath(datasetPath, Files.readSymbolicLink(rawLayerPath))
+            } else {
+              rawLayerPath.toAbsolutePath
+            }
+          } catch {
+            case e: Exception =>
+              logger.error(s"Failed to resolve layer path for ${dataLayer.name}: ${e.getMessage}")
+              rawLayerPath.toAbsolutePath
+          }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private def determineMagRealPathsForDataSource(dataSource: InboxDataSource) = tryo {
val organizationPath = dataBaseDir.resolve(dataSource.id.organizationId)
val datasetPath = organizationPath.resolve(dataSource.id.directoryName)
dataSource.toUsable match {
case Some(usableDataSource) => {
usableDataSource.dataLayers.flatMap { dataLayer =>
val rawLayerPath = datasetPath.resolve(dataLayer.name)
val absoluteLayerPath = if (Files.isSymbolicLink(rawLayerPath)) {
resolveRelativePath(datasetPath, Files.readSymbolicLink(rawLayerPath))
} else {
rawLayerPath.toAbsolutePath
}
dataLayer.mags.map { mag =>
val (magURI, isRemote) = getMagURI(datasetPath, absoluteLayerPath, mag)
if (isRemote) {
MagPathInfo(dataLayer.name, mag.mag, magURI.toString, magURI.toString, hasLocalData = false)
} else {
val magPath = Paths.get(magURI)
val realPath = magPath.toRealPath()
// Does this dataset have local data, i.e. the data that is referenced by the mag path is within the dataset directory
val isLocal = realPath.startsWith(datasetPath.toAbsolutePath)
val unresolvedPath =
rawLayerPath.toAbsolutePath.resolve(absoluteLayerPath.relativize(magPath)).normalize()
MagPathInfo(dataLayer.name,
mag.mag,
unresolvedPath.toUri.toString,
realPath.toUri.toString,
hasLocalData = isLocal)
}
}
}
}
case None => List()
}
}
private def determineMagRealPathsForDataSource(dataSource: InboxDataSource) = tryo {
// Resolves paths for each mag in the data source, handling both local and remote data
// Returns a list of MagPathInfo containing the original and resolved paths
val organizationPath = dataBaseDir.resolve(dataSource.id.organizationId)
val datasetPath = organizationPath.resolve(dataSource.id.directoryName)
dataSource.toUsable match {
case Some(usableDataSource) => {
usableDataSource.dataLayers.flatMap { dataLayer =>
val rawLayerPath = datasetPath.resolve(dataLayer.name)
val absoluteLayerPath = try {
if (Files.isSymbolicLink(rawLayerPath)) {
resolveRelativePath(datasetPath, Files.readSymbolicLink(rawLayerPath))
} else {
rawLayerPath.toAbsolutePath
}
} catch {
case e: Exception =>
logger.error(s"Failed to resolve layer path for ${dataLayer.name}: ${e.getMessage}")
rawLayerPath.toAbsolutePath
}
dataLayer.mags.map { mag =>
val (magURI, isRemote) = getMagURI(datasetPath, absoluteLayerPath, mag)
if (isRemote) {
MagPathInfo(dataLayer.name, mag.mag, magURI.toString, magURI.toString, hasLocalData = false)
} else {
val magPath = Paths.get(magURI)
val realPath = magPath.toRealPath()
// Does this dataset have local data, i.e. the data that is referenced by the mag path is within the dataset directory
val isLocal = realPath.startsWith(datasetPath.toAbsolutePath)
val unresolvedPath =
rawLayerPath.toAbsolutePath.resolve(absoluteLayerPath.relativize(magPath)).normalize()
MagPathInfo(dataLayer.name,
mag.mag,
unresolvedPath.toUri.toString,
realPath.toUri.toString,
hasLocalData = isLocal)
}
}
}
}
case None => List()
}
}

Comment on lines +117 to +122
def reportRealPaths(dataSourcePaths: List[DataSourcePathInfo]): Fox[_] =
rpc(s"$webknossosUri/api/datastores/$dataStoreName/datasources/paths")
.addQueryString("key" -> dataStoreKey)
.silent
.putJson(dataSourcePaths)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider explicit result type and user feedback.

reportRealPaths returns Fox[_] and uses .silent. In case of partial or complete failure, there is no user-facing feedback or logs. Specifying a clearer return type (e.g., Fox[ReportPathsResult]) and/or logging errors can facilitate troubleshooting.

Comment on lines 362 to 371
def getPathsForDatalayer(datasetId: ObjectId, layerName: String): Fox[List[(DatasetMagInfo, List[DatasetMagInfo])]] =
for {
magInfos <- datasetMagsDAO.findPathsForDatasetAndDatalayer(datasetId, layerName)
magInfosAndLinkedMags <- Fox.serialCombined(magInfos)(magInfo =>
for {
pathInfos <- datasetMagsDAO.findAllByRealPath(magInfo.realPath)
} yield (magInfo, pathInfos.filter(!_.equals(magInfo))))

} yield magInfosAndLinkedMags

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider adding input validation for layer name.

The method should validate the layer name before querying the database.

Apply this diff to add validation:

 def getPathsForDatalayer(datasetId: ObjectId, layerName: String): Fox[List[(DatasetMagInfo, List[DatasetMagInfo])]] =
   for {
+    _ <- assertValidLayerNameLax(layerName)
     magInfos <- datasetMagsDAO.findPathsForDatasetAndDatalayer(datasetId, layerName)
     magInfosAndLinkedMags <- Fox.serialCombined(magInfos)(magInfo =>
       for {
         pathInfos <- datasetMagsDAO.findAllByRealPath(magInfo.realPath)
       } yield (magInfo, pathInfos.filter(!_.equals(magInfo))))
   } yield magInfosAndLinkedMags

Committable suggestion skipped: line range outside the PR's diff.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
app/models/dataset/Dataset.scala (1)

765-782: Remove unnecessary empty Fox.successful.

The empty Fox.successful(()) on line 767 is unnecessary and can be removed.

  def updateMagPathsForDataset(datasetId: ObjectId, magPaths: List[MagPathInfo]): Fox[Unit] =
    for {
-     _ <- Fox.successful(())
      updateQueries = magPaths.map(pathInfo => {
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d839176 and 3553612.

📒 Files selected for processing (1)
  • app/models/dataset/Dataset.scala (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build
🔇 Additional comments (4)
app/models/dataset/Dataset.scala (4)

712-721: LGTM! Well-structured case class implementation.

The DatasetMagInfo case class and its companion object are well-designed, providing a clean representation of mag path metadata with proper JSON serialization support. The implementation aligns perfectly with the PR objectives for storing mag paths.


736-738: Replace "uninitialized" with empty string as default value.

Using "uninitialized" as a default value for paths is not ideal. Consider using an empty string instead, as suggested in previous reviews.

Apply this diff:

-                  row.path.getOrElse("uninitialized"),
-                  row.realpath.getOrElse("uninitialized"),
+                  row.path.getOrElse(""),
+                  row.realpath.getOrElse(""),

784-794: LGTM! Well-implemented query and mapping logic.

The method correctly retrieves and maps mag paths for a dataset and layer, with proper error handling using Fox combinators.


795-810: Replace "uninitialized" with empty string and consider adding an index.

  1. Replace "uninitialized" with empty string as default value.
  2. Consider adding an index on the realPath column since we're querying by it.

Apply this diff for the default values:

-                         row.path.getOrElse("uninitialized"),
-                         row.realpath.getOrElse("uninitialized"),
+                         row.path.getOrElse(""),
+                         row.realpath.getOrElse(""),

Consider adding the following index to improve query performance:

CREATE INDEX IF NOT EXISTS dataset_mags_realpath_idx ON webknossos.dataset_mags(realPath);

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
app/controllers/DatasetController.scala (1)

135-145: Consider production readiness improvements.

While the implementation is functionally correct, consider the following improvements for production use:

  1. The admin-only access might be too restrictive for testing purposes. Consider allowing dataset owners or managers to access this endpoint.
  2. Add pagination support for potentially large result sets.
  3. Consider implementing caching to improve performance for frequently accessed datasets.

Here's a suggested implementation with pagination:

-def getLinkedMags(datasetId: ObjectId, dataLayerName: String): Action[AnyContent] =
+def getLinkedMags(
+    datasetId: ObjectId,
+    dataLayerName: String,
+    offset: Option[Int] = Some(0),
+    limit: Option[Int] = Some(100)
+): Action[AnyContent] =
   sil.SecuredAction.async { implicit request =>
     for {
       _ <- datasetDAO.findOne(datasetId) ?~> notFoundMessage(datasetId) ~> NOT_FOUND
-      _ <- Fox.bool2Fox(request.identity.isAdmin) ?~> "notAllowed" ~> FORBIDDEN
+      _ <- Fox.bool2Fox(
+        request.identity.isAdmin ||
+        datasetService.isEditableBy(dataset, Some(request.identity))
+      ) ?~> "notAllowed" ~> FORBIDDEN
       magsAndLinkedMags <- datasetService.getPathsForDatalayer(datasetId, dataLayerName)
-      returnValues = magsAndLinkedMags.map {
+      paginatedMags = magsAndLinkedMags
+        .drop(offset.getOrElse(0))
+        .take(limit.getOrElse(100))
+      returnValues = paginatedMags.map {
         case (mag, linkedMags) => MagLinkInfo(mag, linkedMags)
       }
-    } yield Ok(Json.toJson(returnValues))
+    } yield Ok(Json.obj(
+      "items" -> returnValues,
+      "total" -> magsAndLinkedMags.size,
+      "offset" -> offset.getOrElse(0),
+      "limit" -> limit.getOrElse(100)
+    ))
   }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3553612 and e001ca1.

⛔ Files ignored due to path filters (1)
  • test/db/dataSet_mags.csv is excluded by !**/*.csv
📒 Files selected for processing (3)
  • app/controllers/DatasetController.scala (2 hunks)
  • app/controllers/WKRemoteDataStoreController.scala (2 hunks)
  • conf/webknossos.latest.routes (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • conf/webknossos.latest.routes
  • app/controllers/WKRemoteDataStoreController.scala
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build
🔇 Additional comments (1)
app/controllers/DatasetController.scala (1)

74-78: LGTM!

The case class and companion object are well-structured and follow Scala best practices.

@normanrz normanrz mentioned this pull request Feb 6, 2025
Copy link
Member

@fm3 fm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! My first tests worked fine :) I think the new data structures make sense.

I’m not sure if we should maybe combine the two report requests into one, to avoid this uninitialized state for new datasets, but I guess having two is also fine for the moment.

I added a couple of code suggestions.

Comment on lines +50 to +53
for {
_ <- Fox.successful(())
_ <- remoteWebknossosClient.reportRealPaths(infos)
} yield ()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for {
_ <- Fox.successful(())
_ <- remoteWebknossosClient.reportRealPaths(infos)
} yield ()
remoteWebknossosClient.reportRealPaths(infos)

(I would assume that this is identical?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if these infos don’t live in the DataSourceRepository, I would skip this forwarding function entirely and rather call remoteWebknossosClient.reportRealPaths(infos) directly from the DataSourceService.

foundInboxSources = organizationDirs.flatMap(teamAwareInboxSources)
_ = logFoundDatasources(foundInboxSources, verbose)
_ <- dataSourceRepository.updateDataSources(foundInboxSources)
_ <- sendRealPaths(foundInboxSources)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I saw report, send and publish now. I’d suggest to stick with “report”

Comment on lines +233 to +235
case e: JsError =>
logger.warn("Data store reported invalid json for data source paths.")
Fox.successful(JsonBadRequest(JsError.toJson(e)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason to validate explicitly and log errors here, rather than using a playBodyParser to build an Action[List[DataSourcePathInfo]]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that was just copied from the function above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So should this be changed for all methods in this controller?

Copy link
Member

@fm3 fm3 Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it for updateAll in #8402 – but I’d say it’s not super important to update this now. Both is valid. The older variant has the benefit of controller-side logging (but I don’t think I ever saw this)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #8406 for that.

def updateMagPathsForDataset(datasetId: ObjectId, magPaths: List[MagPathInfo]): Fox[Unit] =
for {
_ <- Fox.successful(())
updateQueries = magPaths.map(pathInfo => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
updateQueries = magPaths.map(pathInfo => {
updateQueries = magPathInfos.map(magPathInfo => {

(Seems cleaner to me if the variable names match, and also match the type. Or, for the inner-most, use an obvious abbreviation like m. That way the reader doesn’t have to understand that these things are all the same)

q"""UPDATE webknossos.dataset_mags
SET path = ${pathInfo.path}, realPath = ${pathInfo.realPath}, haslocaldata = ${pathInfo.hasLocalData}
WHERE _dataset = $datasetId
AND datalayername = ${pathInfo.layerName}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please follow capitalization from schema.sql here, dataLayerName and hasLocalData

Comment on lines 806 to 807
row.path.getOrElse("uninitialized"),
row.realpath.getOrElse("uninitialized"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use None for this case? I feel like having a custom string literal in the json could cause headaches down the line. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this to avoid having options in the DatasetMagInfo, since this is only a theoretical case either way. The path and realPath are set when the dataset is reported, so it would only be empty between the two report functions that are separate at the moment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn’t call it a theoretical case, since there is a real time difference between the two report functions. That means that this value will, at least for short durations, occur in the database and downstream functions will have to be able to handle it. If a function lists all datasets that use the same layer, anad does string comparison, this would return all that have "uninitialized", which could become a nonsensical list. I’d rather have the Option if we cannot prevent that this value is ever inserted.

case Full(dataset) => datasetMagsDAO.updateMagPathsForDataset(dataset._id, pathInfo.magPathInfos)
case Empty => // Dataset reported but ignored (non-existing/forbidden org)
Fox.successful(())
case e: EmptyBox =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, TIL about the EmptyBox trait. Looks like we should have used this in a lot of places where we match against Failure (but miss ParamFailure?). But I guess we instantiate so few ParamFailures that this hasn’t caused any real problems.

_ <- Fox.serialCombined(pathInfos)(updateRealPath)
} yield ()

def getPathsForDatalayer(datasetId: ObjectId, layerName: String): Fox[List[(DatasetMagInfo, List[DatasetMagInfo])]] =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use capital L in DataLayer/dataLayer for consistency. There are a couple of usages.

pathInfos = magPathBoxes.map {
case (ds, Full(magPaths)) => DataSourcePathInfo(ds.id, magPaths)
case (ds, Failure(msg, _, _)) =>
logger.error(s"Failed to determine real paths for mags of $ds: $msg")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use formatFailureChain from the Formatter trait. Have a look at the other usages.

}
case None => List()
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function has grown pretty complex, maybe some of its logic could be extracted to subfunctions? That could make it easier to read. A candidate might be the block handling one layer, and also the block handling one mag.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
app/models/dataset/DatasetService.scala (1)

343-355: Consider improving error handling and logging.

The error handling in updateRealPath could be enhanced:

  1. The empty case could benefit from debug logging
  2. The failure case could include more context in the error message

Apply this diff to improve error handling:

 private def updateRealPath(pathInfo: DataSourcePathInfo)(implicit ctx: DBAccessContext): Fox[Unit] =
   if (pathInfo.magPathInfos.isEmpty) {
+    logger.debug(s"No mag paths to update for dataset ${pathInfo.dataSourceId}")
     Fox.successful(())
   } else {
     val dataset = datasetDAO.findOneByDataSourceId(pathInfo.dataSourceId).futureBox
     dataset.flatMap {
       case Full(dataset) => datasetMagsDAO.updateMagPathsForDataset(dataset._id, pathInfo.magPathInfos)
       case Empty => // Dataset reported but ignored (non-existing/forbidden org)
+        logger.debug(s"Skipping dataset ${pathInfo.dataSourceId} (non-existing/forbidden org)")
         Fox.successful(())
       case e: EmptyBox =>
-        Fox.failure("dataset.notFound", e)
+        Fox.failure(s"Failed to find dataset ${pathInfo.dataSourceId}", e)
     }
   }
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (2)

94-108: Consider improving error handling and logging for path reporting.

The error handling in reportRealPaths could be enhanced:

  1. Add debug logging for successful path reporting
  2. Improve error message formatting

Apply this diff to improve error handling:

 private def reportRealPaths(dataSources: List[InboxDataSource]) =
   for {
     _ <- Fox.successful(())
     magPathBoxes = dataSources.map(ds => (ds, determineMagRealPathsForDataSource(ds)))
     pathInfos = magPathBoxes.map {
       case (ds, Full(magPaths)) =>
+        logger.debug(s"Successfully determined paths for ${ds.id}")
         DataSourcePathInfo(ds.id, magPaths)
       case (ds, failure: Failure) =>
-        logger.error(formatFailureChain(failure))
+        logger.error(s"Failed to determine paths for ${ds.id}: ${formatFailureChain(failure)}")
         DataSourcePathInfo(ds.id, List())
       case (ds, Empty) =>
-        logger.error(s"Failed to determine real paths for mags of $ds")
+        logger.error(s"Empty result while determining paths for ${ds.id}")
         DataSourcePathInfo(ds.id, List())
     }
     _ <- remoteWebknossosClient.reportRealPaths(pathInfos)
   } yield ()

110-128: Extract complex path resolution logic into subfunctions.

The method has grown complex and could benefit from being split into smaller, focused functions.

Apply this diff to improve code organization:

+  private def resolveLayerPath(datasetPath: Path, dataLayer: DataLayer): Path = {
+    val rawLayerPath = datasetPath.resolve(dataLayer.name)
+    if (Files.isSymbolicLink(rawLayerPath)) {
+      resolveRelativePath(datasetPath, Files.readSymbolicLink(rawLayerPath))
+    } else {
+      rawLayerPath.toAbsolutePath
+    }
+  }
+
 private def determineMagRealPathsForDataSource(dataSource: InboxDataSource) = tryo {
   val organizationPath = dataBaseDir.resolve(dataSource.id.organizationId)
   val datasetPath = organizationPath.resolve(dataSource.id.directoryName)
   dataSource.toUsable match {
     case Some(usableDataSource) =>
       usableDataSource.dataLayers.flatMap { dataLayer =>
-        val rawLayerPath = datasetPath.resolve(dataLayer.name)
-        val absoluteLayerPath = if (Files.isSymbolicLink(rawLayerPath)) {
-          resolveRelativePath(datasetPath, Files.readSymbolicLink(rawLayerPath))
-        } else {
-          rawLayerPath.toAbsolutePath
-        }
+        val absoluteLayerPath = resolveLayerPath(datasetPath, dataLayer)
         dataLayer.mags.map { mag =>
           getMagPathInfo(datasetPath, absoluteLayerPath, rawLayerPath, dataLayer, mag)
         }
       }
     case None => List()
   }
 }
app/models/dataset/Dataset.scala (1)

722-739: Consider using None for uninitialized paths.

Using string literals for uninitialized paths could cause issues. Consider using Option[String] instead.

Apply this diff to improve path handling:

-                         row.path.getOrElse("uninitialized"),
-                         row.realpath.getOrElse("uninitialized"),
+                         row.path.getOrElse(""),
+                         row.realpath.getOrElse(""),
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a47472f and 592e39d.

📒 Files selected for processing (5)
  • MIGRATIONS.unreleased.md (1 hunks)
  • app/controllers/DatasetController.scala (2 hunks)
  • app/models/dataset/Dataset.scala (4 hunks)
  • app/models/dataset/DatasetService.scala (3 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • MIGRATIONS.unreleased.md
  • app/controllers/DatasetController.scala
🔇 Additional comments (10)
app/models/dataset/DatasetService.scala (4)

17-17: LGTM!

The imports are correctly organized and necessary for the new functionality.

Also applies to: 23-23


37-37: LGTM!

The constructor dependency is correctly added and follows the existing pattern.


357-360: LGTM!

The updateRealPaths method correctly handles batch processing of path updates using Fox.serialCombined.


362-370: Add input validation for layer name.

The method should validate the layer name before querying the database.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (3)

8-8: LGTM!

The imports and trait mixins are correctly organized and necessary for the new functionality.

Also applies to: 12-12, 40-41


34-34: LGTM!

The constructor dependency is correctly added and follows the existing pattern.


153-161: LGTM!

The utility methods getMagURI and resolveRelativePath are well-implemented and handle their specific tasks correctly.

Also applies to: 163-168

app/models/dataset/Dataset.scala (3)

712-721: LGTM!

The DatasetMagInfo case class is well-designed and includes all necessary fields with proper JSON formatting.


765-782: LGTM!

The updateMagPathsForDataset method correctly handles path updates with proper transaction isolation and retry logic.


784-794: LGTM!

The path query methods are well-implemented with proper error handling and data transformation.

Also applies to: 795-811

Comment on lines +130 to +151
private def getMagPathInfo(datasetPath: Path,
absoluteLayerPath: Path,
rawLayerPath: Path,
dataLayer: DataLayer,
mag: MagLocator) = {
val (magURI, isRemote) = getMagURI(datasetPath, absoluteLayerPath, mag)
if (isRemote) {
MagPathInfo(dataLayer.name, mag.mag, magURI.toString, magURI.toString, hasLocalData = false)
} else {
val magPath = Paths.get(magURI)
val realPath = magPath.toRealPath()
// Does this dataset have local data, i.e. the data that is referenced by the mag path is within the dataset directory
val isLocal = realPath.startsWith(datasetPath.toAbsolutePath)
val unresolvedPath =
rawLayerPath.toAbsolutePath.resolve(absoluteLayerPath.relativize(magPath)).normalize()
MagPathInfo(dataLayer.name,
mag.mag,
unresolvedPath.toUri.toString,
realPath.toUri.toString,
hasLocalData = isLocal)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for path resolution failures.

The getMagPathInfo method should handle potential failures in path resolution.

Apply this diff to add error handling:

 private def getMagPathInfo(datasetPath: Path,
                          absoluteLayerPath: Path,
                          rawLayerPath: Path,
                          dataLayer: DataLayer,
                          mag: MagLocator) = {
-  val (magURI, isRemote) = getMagURI(datasetPath, absoluteLayerPath, mag)
-  if (isRemote) {
-    MagPathInfo(dataLayer.name, mag.mag, magURI.toString, magURI.toString, hasLocalData = false)
-  } else {
-    val magPath = Paths.get(magURI)
-    val realPath = magPath.toRealPath()
-    val isLocal = realPath.startsWith(datasetPath.toAbsolutePath)
-    val unresolvedPath =
-      rawLayerPath.toAbsolutePath.resolve(absoluteLayerPath.relativize(magPath)).normalize()
-    MagPathInfo(dataLayer.name,
-                mag.mag,
-                unresolvedPath.toUri.toString,
-                realPath.toUri.toString,
-                hasLocalData = isLocal)
+  try {
+    val (magURI, isRemote) = getMagURI(datasetPath, absoluteLayerPath, mag)
+    if (isRemote) {
+      MagPathInfo(dataLayer.name, mag.mag, magURI.toString, magURI.toString, hasLocalData = false)
+    } else {
+      val magPath = Paths.get(magURI)
+      val realPath = magPath.toRealPath()
+      val isLocal = realPath.startsWith(datasetPath.toAbsolutePath)
+      val unresolvedPath =
+        rawLayerPath.toAbsolutePath.resolve(absoluteLayerPath.relativize(magPath)).normalize()
+      MagPathInfo(dataLayer.name,
+                  mag.mag,
+                  unresolvedPath.toUri.toString,
+                  realPath.toUri.toString,
+                  hasLocalData = isLocal)
+    }
+  } catch {
+    case e: Exception =>
+      logger.error(s"Failed to resolve path for mag ${mag.mag} in layer ${dataLayer.name}: ${e.getMessage}")
+      throw e
   }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private def getMagPathInfo(datasetPath: Path,
absoluteLayerPath: Path,
rawLayerPath: Path,
dataLayer: DataLayer,
mag: MagLocator) = {
val (magURI, isRemote) = getMagURI(datasetPath, absoluteLayerPath, mag)
if (isRemote) {
MagPathInfo(dataLayer.name, mag.mag, magURI.toString, magURI.toString, hasLocalData = false)
} else {
val magPath = Paths.get(magURI)
val realPath = magPath.toRealPath()
// Does this dataset have local data, i.e. the data that is referenced by the mag path is within the dataset directory
val isLocal = realPath.startsWith(datasetPath.toAbsolutePath)
val unresolvedPath =
rawLayerPath.toAbsolutePath.resolve(absoluteLayerPath.relativize(magPath)).normalize()
MagPathInfo(dataLayer.name,
mag.mag,
unresolvedPath.toUri.toString,
realPath.toUri.toString,
hasLocalData = isLocal)
}
}
private def getMagPathInfo(datasetPath: Path,
absoluteLayerPath: Path,
rawLayerPath: Path,
dataLayer: DataLayer,
mag: MagLocator) = {
try {
val (magURI, isRemote) = getMagURI(datasetPath, absoluteLayerPath, mag)
if (isRemote) {
MagPathInfo(dataLayer.name, mag.mag, magURI.toString, magURI.toString, hasLocalData = false)
} else {
val magPath = Paths.get(magURI)
val realPath = magPath.toRealPath()
// Does this dataset have local data, i.e. the data that is referenced by the mag path is within the dataset directory
val isLocal = realPath.startsWith(datasetPath.toAbsolutePath)
val unresolvedPath =
rawLayerPath.toAbsolutePath.resolve(absoluteLayerPath.relativize(magPath)).normalize()
MagPathInfo(dataLayer.name,
mag.mag,
unresolvedPath.toUri.toString,
realPath.toUri.toString,
hasLocalData = isLocal)
}
} catch {
case e: Exception =>
logger.error(s"Failed to resolve path for mag ${mag.mag} in layer ${dataLayer.name}: ${e.getMessage}")
throw e
}
}

@frcroth frcroth requested a review from fm3 February 24, 2025 10:22
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
app/models/dataset/DatasetService.scala (1)

357-360: Consider adding batch size limit for path updates.

While the implementation is correct, processing a large number of path updates sequentially could impact performance.

 def updateRealPaths(pathInfos: List[DataSourcePathInfo])(implicit ctx: DBAccessContext): Fox[Unit] =
   for {
-    _ <- Fox.serialCombined(pathInfos)(updateRealPath)
+    _ <- Fox.serialCombined(pathInfos.grouped(100).toList)(batch =>
+      Fox.serialCombined(batch)(updateRealPath))
   } yield ()
app/models/dataset/Dataset.scala (1)

766-783: Consider adding validation for mag values.

While the implementation handles the database operations correctly, it might be good to add validation for mag values before updating.

 def updateMagPathsForDataset(datasetId: ObjectId, magPathInfos: List[MagPathInfo]): Fox[Unit] =
   for {
-    _ <- Fox.successful(())
+    _ <- Fox.serialCombined(magPathInfos)(magPathInfo =>
+      bool2Fox(magPathInfo.mag.x >= 0 && magPathInfo.mag.y >= 0 && magPathInfo.mag.z >= 0) ?~> "Invalid mag values")
     updateQueries = magPathInfos.map(magPathInfo => {
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 592e39d and 9f87a5e.

📒 Files selected for processing (2)
  • app/models/dataset/Dataset.scala (4 hunks)
  • app/models/dataset/DatasetService.scala (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build
🔇 Additional comments (4)
app/models/dataset/DatasetService.scala (2)

343-355: LGTM! Good error handling for dataset path updates.

The implementation properly handles different cases:

  • Empty magPathInfos
  • Non-existing datasets
  • Error conditions with EmptyBox

362-373: LGTM! Robust implementation of path retrieval.

The implementation correctly:

  • Handles both linked and unlinked mags
  • Filters out self-references in linked mags
  • Properly handles None case for realPath
app/models/dataset/Dataset.scala (2)

712-727: LGTM! Well-structured data models for mag information.

The case classes are well-designed with:

  • Clear separation of concerns
  • Proper type safety
  • JSON format support

785-806: LGTM! Efficient implementation of path queries.

The implementation:

  • Uses proper SQL joins
  • Handles path resolution correctly
  • Maintains type safety

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
app/models/dataset/Dataset.scala (1)

712-717: 🛠️ Refactor suggestion

Follow schema.sql field naming convention.

The field names should match the capitalization from schema.sql:

  • dataLayerName should be dataLayerName
  • realPath should be realPath
  • hasLocalData should be hasLocalData

Apply this diff to fix the field names:

 case class DatasetMagInfo(datasetId: ObjectId,
-                          dataLayerName: String,
+                          dataLayerName: String,
                           mag: Vec3Int,
                           path: Option[String],
-                          realPath: Option[String],
+                          realPath: Option[String],
-                          hasLocalData: Boolean)
+                          hasLocalData: Boolean)
🧹 Nitpick comments (2)
app/models/dataset/Dataset.scala (2)

767-769: Remove unnecessary empty Fox.successful.

The empty Fox.successful(()) is not needed here as it doesn't contribute to the logic.

Apply this diff to remove the unnecessary line:

   def updateMagPathsForDataset(datasetId: ObjectId, magPathInfos: List[MagPathInfo]): Fox[Unit] =
     for {
-      _ <- Fox.successful(())
       updateQueries = magPathInfos.map(magPathInfo => {

769-776: Match variable names with their types.

For better readability, variable names should match their types. In this case, magPathInfo should be used consistently.

Apply this diff to improve variable naming:

-      updateQueries = magPathInfos.map(magPathInfo => {
+      updateQueries = magPathInfos.map(m => {
-        val magLiteral = s"(${magPathInfo.mag.x}, ${magPathInfo.mag.y}, ${magPathInfo.mag.z})"
+        val magLiteral = s"(${m.mag.x}, ${m.mag.y}, ${m.mag.z})"
         q"""UPDATE webknossos.dataset_mags
                  SET path = ${magPathInfo.path}, realPath = ${magPathInfo.realPath}, hasLocalData = ${magPathInfo.hasLocalData}
                  WHERE _dataset = $datasetId
-                  AND dataLayerName = ${magPathInfo.layerName}
+                  AND dataLayerName = ${m.layerName}
                  AND mag = CAST($magLiteral AS webknossos.vector3)""".asUpdate
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9f87a5e and 331b1b2.

📒 Files selected for processing (1)
  • app/models/dataset/Dataset.scala (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build

Copy link
Member

@fm3 fm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! 😎 Backend LGTM.
Please remove the getLinkedMags route before merging to avoid leaking non-public information.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)

40-50: Consider adding unit tests for these newly introduced case classes.

It would be beneficial to have tests covering JSON (de)serialization and the logical interactions of these data classes. That way, any regressions in their structure or usage can be caught early.

Would you like me to generate a basic test suite for these case classes?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 331b1b2 and 1edeee3.

📒 Files selected for processing (7)
  • MIGRATIONS.unreleased.md (1 hunks)
  • app/controllers/DatasetController.scala (1 hunks)
  • app/controllers/WKRemoteDataStoreController.scala (2 hunks)
  • conf/webknossos.latest.routes (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (3 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceRepository.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • MIGRATIONS.unreleased.md
  • conf/webknossos.latest.routes
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceRepository.scala
  • app/controllers/DatasetController.scala
  • app/controllers/WKRemoteDataStoreController.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build
🔇 Additional comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2)

8-8: Import looks good.

This import from com.scalableminds.util.geometry.Vec3Int properly facilitates the new mag field in MagPathInfo.


117-122: Revisit explicit result type and logging strategy.

Currently, reportRealPaths returns Fox[_] and uses .silent, providing minimal visibility into errors or partial failures. Please consider returning a more explicit type (e.g., Fox[ReportPathsResult]) and/or adding logs on failures to facilitate troubleshooting.

@frcroth frcroth merged commit 98d4c01 into master Mar 3, 2025
3 checks passed
@frcroth frcroth deleted the realpaths branch March 3, 2025 15:26
@coderabbitai coderabbitai bot mentioned this pull request Jun 19, 2025
28 tasks
@coderabbitai coderabbitai bot mentioned this pull request Sep 12, 2025
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Find all datasets that share a layer
2 participants