Add further debug logging for ds listing #8247

MichaelBuessemeyer · 2024-11-28T14:25:12Z

The method listGrouped needs to be investigated further.

Answering 400 at /api/datasets?isActive=false for user <id>: {"messages":[{"error":"Grouping datasets failed"},{"chain":"[Server Time 2024-11-28 13:54] <~ Failed to retrieve list of data sets."}]}

Follow up to #8246

Steps to test:

same as in Add debug logging for dataset list route #8246

coderabbitai · 2024-11-28T14:25:21Z

📝 Walkthrough

Walkthrough

The pull request introduces enhancements to the DatasetController class, specifically improving logging and error handling in the listGrouped and segmentAnythingMask methods. Modifications include additional logging for dataset processing and user details, refined error messages, and strengthened input validation. These updates aim to enhance the observability and robustness of dataset operations.

Changes

File	Change Summary
app/controllers/DatasetController.scala	- Enhanced logging in `listGrouped` method, including dataset and user details. - Improved error handling with descriptive messages. - Strengthened input validation in `segmentAnythingMask` method with new checks and logging.

Possibly related PRs

Fix accessing public datasets #8242: Modifies the segmentAnythingMask method to enhance validation and error handling, aligning with the main PR's improvements.
Add debug logging for dataset list route #8246: Updates the segmentAnythingMask method by adding validation checks and improving logging, directly relating to the changes in the main PR.

Suggested labels

bug, urgent

Suggested reviewers

fm3

Poem

🐇 In the code where datasets play,
Logging brightens the data way.
Errors caught with messages clear,
Observability brings us cheer!
With each new check, we hop along,
Enhancing our code, making it strong! 🌟

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between fe47da7 and 2179355.

📒 Files selected for processing (1)

app/controllers/DatasetController.scala (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

app/controllers/DatasetController.scala

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (2)

app/controllers/DatasetController.scala (1)
179-262: Consider implementing structured logging

To improve the debugging capabilities while maintaining security and performance, consider implementing structured logging:

Define a logging schema for dataset operations

Use MDC (Mapped Diagnostic Context) to track request context

Implement log sanitization to prevent PII exposure

Add request tracing for better correlation of logs

Example implementation:
object DatasetLogging {
  case class DatasetOperationContext(
    datasetCount: Int,
    organizationId: String,
    requestId: String,
    userId: Option[String]
  )

  def withContext[T](context: DatasetOperationContext)(block: => T): T = {
    MDC.put("request.id", context.requestId)
    MDC.put("dataset.count", context.datasetCount.toString)
    MDC.put("organization.id", context.organizationId)
    context.userId.foreach(id => MDC.put("user.id", id))
    try {
      block
    } finally {
      MDC.clear()
    }
  }
}
app/models/dataset/Dataset.scala (1)
223-227: Consider enhancing the second log message.

While the first log message provides valuable information about selection predicates, the second log message "Requesting datasets with query" could be more informative.

Consider this improvement:
-      _ = logger.info("Requesting datasets with query")
+      _ = logger.info(s"Executing dataset query with limit ${limitOpt.getOrElse("none")}")

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 781f687 and 82690ac.

📒 Files selected for processing (2)

app/controllers/DatasetController.scala (1 hunks)
app/models/dataset/Dataset.scala (2 hunks)

🔇 Additional comments (1)

app/models/dataset/Dataset.scala (1)

118-125: LGTM! Improved error messages for better debugging.

The error messages are now more specific and descriptive, which will help in identifying parsing issues more effectively.

coderabbitai · 2024-11-28T14:27:53Z

app/controllers/DatasetController.scala

      m: MessagesProvider): Fox[List[JsObject]] =
    for {
+      _ <- Fox.successful(())
+      _ = logger.info(s"datasets: $datasets, requestingUser: $requestingUser")


⚠️ Potential issue

Security & Performance: Revise logging strategy

The current logging implementation has several concerns:

Security: Logging entire objects (datasets, requestingUser, requestingUserTeamManagerMemberships) may expose sensitive information

Performance: Excessive logging at multiple levels

Log levels: Using INFO level for debug information

- _ = logger.info(s"datasets: $datasets, requestingUser: $requestingUser") + _ = logger.debug(s"Processing ${datasets.size} datasets for user ${requestingUser.map(_._id).getOrElse("anonymous")}") - _ = logger.info(s"requestingUserTeamManagerMemberships: $requestingUserTeamManagerMemberships") + _ = logger.debug(s"User ${requestingUser.map(_._id).getOrElse("anonymous")} has ${requestingUserTeamManagerMemberships.size} team manager memberships") - _ = logger.info(s"byOrgaTuple: $byOrgaTuple") + _ = logger.debug(s"Processing organization: ${byOrgaTuple._1} with ${byOrgaTuple._2.size} datasets") - logger.info(s"datastore: ${byDataStoreTuple._1}, datasets: ${byDataStoreTuple._2}") + logger.debug(s"Processing datastore: ${byDataStoreTuple._1} with ${byDataStoreTuple._2.size} datasets")

Also applies to: 233-233, 238-238, 243-243

coderabbitai · 2024-11-28T14:27:54Z

app/controllers/DatasetController.scala

+        userService
+          .teamManagerMembershipsFor(user._id)) ?~> s"Could not find team manager memberships for user ${requestingUser
+        .map(_._id)}"


🛠️ Refactor suggestion

Enhance error messages for better debugging

The error messages should provide more context about what failed and why.

userService .teamManagerMembershipsFor(user._id)) ?~> s"Could not find team manager memberships for user ${requestingUser - .map(_._id)}" + .map(_._id)}. Verify user permissions and team associations." - } ?~> s"Could not group by datastore for datasets ${byOrgaTuple._2.map(_.name)}" + } ?~> s"Failed to process datasets by datastore. Organization: ${organization._id}, Datastores: ${groupedByDataStore.map(_._1).mkString(", ")}" - } ?~> s"Could not group by organization for datasets ${datasets.map(_.name)}" + } ?~> s"Failed to process datasets by organization. Dataset count: ${datasets.size}, Organizations: ${groupedByOrga.map(_._1).mkString(", ")}"

Also applies to: 260-260, 262-262

coderabbitai · 2024-11-28T14:27:54Z

app/controllers/DatasetController.scala

+                                                     limit) ?~> "dataset.list.failed" ?~> "Dataset listing failed"
+            _ = logger.info(s"Found ${datasets.size} datasets successfully")
+            js <- listGrouped(datasets, request.identity) ?~> "dataset.list.failed" ?~> "Grouping datasets failed"


🛠️ Refactor suggestion

Improve error handling and messages

The error handling in these lines could be enhanced:

Remove the redundant error message "Dataset listing failed" as it duplicates "dataset.list.failed"

Enhance the error message for grouping failure to include more context

- limit) ?~> "dataset.list.failed" ?~> "Dataset listing failed" + limit) ?~> "dataset.list.failed" _ = logger.info(s"Found ${datasets.size} datasets successfully") - js <- listGrouped(datasets, request.identity) ?~> "dataset.list.failed" ?~> "Grouping datasets failed" + js <- listGrouped(datasets, request.identity) ?~> "dataset.list.grouping.failed" ?~> s"Failed to group ${datasets.size} datasets"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

limit) ?~> "dataset.list.failed" ?~> "Dataset listing failed"

_ = logger.info(s"Found ${datasets.size} datasets successfully")

js <- listGrouped(datasets, request.identity) ?~> "dataset.list.failed" ?~> "Grouping datasets failed"

limit) ?~> "dataset.list.failed"

_ = logger.info(s"Found ${datasets.size} datasets successfully")

js <- listGrouped(datasets, request.identity) ?~> "dataset.list.grouping.failed" ?~> s"Failed to group ${datasets.size} datasets"

fm3

let’s avoid printing entire user objects, other than that LGTM

fm3 · 2024-11-28T14:29:31Z

app/controllers/DatasetController.scala

      m: MessagesProvider): Fox[List[JsObject]] =
    for {
+      _ <- Fox.successful(())
+      _ = logger.info(s"datasets: $datasets, requestingUser: $requestingUser")


Suggested change

_ = logger.info(s"datasets: $datasets, requestingUser: $requestingUser")

_ = logger.info(s"datasets: $datasets, requestingUser: ${requestingUser._id}")

let’s avoid printing entire user objects,

only user objects?

I just reduced the logging to only id or such and not whole object. Do you think whole object is ok (besides users)? This might make debugging easier 🤔

Yes, should be ok. This is only temporary, this isn’t super sensitive data and the logs are of course not public.

…eminds/webknossos into add-debug-logging-for-ds-listing

Michael Büßemeyer added 2 commits November 28, 2024 14:08

add debug logging for dataset list route

f0b1b2f

add extensive logging to listGrouped method of DatasetController

82690ac

MichaelBuessemeyer added the backend label Nov 28, 2024

MichaelBuessemeyer requested a review from fm3 November 28, 2024 14:25

MichaelBuessemeyer self-assigned this Nov 28, 2024

coderabbitai bot reviewed Nov 28, 2024

View reviewed changes

Merge branch 'master' into add-debug-logging-for-ds-listing

ed69e48

fm3 approved these changes Nov 28, 2024

View reviewed changes

Michael Büßemeyer and others added 5 commits November 28, 2024 15:36

dont log full objects

4c75c97

Merge branch 'add-debug-logging-for-ds-listing' of github.com:scalabl…

fe47da7

…eminds/webknossos into add-debug-logging-for-ds-listing

Merge branch 'master' into add-debug-logging-for-ds-listing

b6e432b

restore logging full object except for user objects

6295857

Merge branch 'add-debug-logging-for-ds-listing' of github.com:scalabl…

2179355

…eminds/webknossos into add-debug-logging-for-ds-listing

MichaelBuessemeyer enabled auto-merge (squash) November 28, 2024 15:07

MichaelBuessemeyer merged commit 24e4981 into master Nov 28, 2024
3 checks passed

MichaelBuessemeyer deleted the add-debug-logging-for-ds-listing branch November 28, 2024 15:20

MichaelBuessemeyer mentioned this pull request Nov 28, 2024

Remove logging to debug list dataset routes #8248

Merged

This was referenced Nov 28, 2024

Fix permisson error in listGrouped in list datasets route #8249

Merged

Re-Add debug logging for ds listing #8251

Merged

Remove debug logging for dataset listing #8256

Merged

Fix legacy support for outdated displayName field of datasets #8263

Merged

	_ = logger.info(s"datasets: $datasets, requestingUser: $requestingUser")
	_ = logger.info(s"datasets: $datasets, requestingUser: ${requestingUser._id}")

Add further debug logging for ds listing #8247

Add further debug logging for ds listing #8247

Uh oh!

Conversation

MichaelBuessemeyer commented Nov 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Steps to test:

Uh oh!

coderabbitai bot commented Nov 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 28, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 28, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 28, 2024

Choose a reason for hiding this comment

Uh oh!

fm3 left a comment

Choose a reason for hiding this comment

Uh oh!

fm3 Nov 28, 2024

Choose a reason for hiding this comment

Uh oh!

MichaelBuessemeyer Nov 28, 2024

Choose a reason for hiding this comment

Uh oh!

MichaelBuessemeyer Nov 28, 2024

Choose a reason for hiding this comment

Uh oh!

fm3 Nov 28, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

MichaelBuessemeyer commented Nov 28, 2024 •

edited

Loading

coderabbitai bot commented Nov 28, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)