Skip to content

Conversation

fm3
Copy link
Member

@fm3 fm3 commented Mar 20, 2025

  • Bugfixes
    • property volumeBucketDataHasChanged is now set correctly again
    • when updating volume buckets, segmentIndex now correctly queries fallback layer
    • The data requests for volume and editable mapping buckets used in segment statistics now use the newest tracing version
  • Perf improvements
    • Use new fossilDB multi-get (new in Add new api call GetMultipleKeysByList fossildb#52) and multi-put when handling bucket updates, both on bucketData and segmentIndex. Introduces new VolumeBucketBuffer
    • When loading multiple volume buckets with fallbackLayer at once (e.g. for updating segment index), the fallback buckets are also loaded in batch in a single request.
    • scan ids contained in bucket using cpp function
    • use set for bucketPositions for fewer conversions
    • reuse emptyArrayForElementClass instance
  • Refactoring
    • segmentIndexBuffer read functions and other segmentIndex read functions were deduplicated and cleaned up
    • some renamings

Measurements

old new factor
without fallback layer, single segment, 300 buckets, cold 698 ms 373 ms 1.7
without fallback layer, single segment, 300 buckets, warm 686 ms 145 ms 4.7
with fallback layer, 800 buckets, many segments changed, cold 12.85 s 4.03 s 3.2
with fallback layer, 800 buckets, many segments changed, warm 3.48 s 1.32 s 2.6
with fallback layer, 30 buckets, small diff, cold 281 ms 139 ms 2.0
with fallback layer, 30 buckets, small diff, warm 141 ms 54 ms 2.6

Steps to test:

  • Create annotations with and without fallback layer (try with fallback layer that has a segment index file)
  • Brush, save, reload, should work, segment stats should show plausible numbers
  • Segment stats in proofreading annotations should also show plausible values

TODOs:

  • resolve code todos
  • multi-put batching
  • multi fallback bucket loading
  • build new multi-get for bucketBuffer prefill
  • collect timing measurements
  • remove logging
  • fix for proofreading (seems broken on master too)
  • bump fossildb version once released

Issues:

Follow-Up PR


@fm3 fm3 self-assigned this Mar 20, 2025
Copy link
Contributor

coderabbitai bot commented Mar 20, 2025

📝 Walkthrough

Walkthrough

This pull request introduces a variety of updates across configuration, backend, and tracing modules. Key changes include the addition of a new C++ indent configuration, updates to the changelog and migration requirements, and a revised Docker image version. The changes also convert bucket position collections from lists to sets, add new JNI methods for native bucket scanning, expand FossilDB API messages and RPC methods, and revise numerous method signatures and internal logic in tracing components to enhance consistency and efficiency.

Changes

File(s) Change Summary
.editorconfig, CHANGELOG.unreleased.md, MIGRATIONS.unreleased.md, docker-compose.yml, fossildb/version Added new C++ indent configuration, updated changelog entries and migration requirements, revised the Docker image version, and bumped FossilDB version.
tools/proxy/proxy.js, util/src/main/scala/com/scalableminds/util/tools/BoxImplicits.scala Reduced error countdown (5 → 3 seconds) in proxy response and introduced the assertNoFailure method in BoxImplicits.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/NativeBucketScanner.scala, webknossos-jni/src/bucketScanner.cpp, webknossos-jni/src/include/com_scalableminds_webknossos_datastore_helpers_NativeBucketScanner.h Introduced a new native bucket scanning class and corresponding JNI function and header for segment ID collection.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/SegmentStatistics.scala, webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/SegmentIndexFileService.scala, webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/TemporaryStore.scala, webknossos-datastore/proto/fossildbapi.proto Refactored bucket position handling from list to set, updated method signatures, and extended the FossilDB API with new multi-key/version RPC messages.
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TSRemoteDatastoreClient.scala, webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/annotation/TSAnnotationService.scala, webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/controllers/EditableMappingController.scala, webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/controllers/VolumeTracingController.scala, webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/controllers/VolumeTracingZarrStreamingController.scala, webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/FossilDBClient.scala, webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/RemoteFallbackLayer.scala, webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/TemporaryTracingService.scala, webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala, webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingMergeService.scala, webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingService.scala Renamed remote fallback methods (from FromVolumeTracing to ForVolumeTracing), updated data request types (list → sequence), and adjusted method signatures to accommodate set-based data structures.
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeBucketBuffer.scala, webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexBuffer.scala, webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexService.scala, webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentStatisticsService.scala, webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingBucketHelper.scala, webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingLayer.scala, webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingService.scala, webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeUpdateActions.scala Added new classes and methods for bucket buffering, segment index management, and fallback handling; refactored method signatures and internal logic for volume tracing consistency.

Suggested labels

bug, worker

Suggested reviewers

  • MichaelBuessemeyer

Poem

Hop, hop, I’m a bunny on the run,
Scanning buckets under the coding sun,
With each tiny tweak a hop so light,
Converting lists to sets, oh what a sight!
My whiskers twitch with every change done 🐇✨


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@fm3 fm3 changed the title WIP: Improve segment index performance WIP: Improve volume and segment index performance Mar 26, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeBucketBuffer.scala (1)

32-36: Implicit execution context parameter might be redundant

The getWithFallback method takes an implicit ec: ExecutionContext parameter, but the class constructor already provides an execution context. Is there a reason for having both?

-  def getWithFallback(bucketPosition: BucketPosition)(implicit ec: ExecutionContext): Fox[Array[Byte]] =
+  def getWithFallback(bucketPosition: BucketPosition): Fox[Array[Byte]] =
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexBuffer.scala (2)

90-119: Potential performance improvement
Currently, segment retrieval is handled in multiple steps (buffer, fossil, datastore). If the data store client supports a bulk/multi-get call, it may boost performance for large lists of segment IDs.


136-156: Use of mutable vars
This method accumulates hits and misses via mutable lists. Consider a single-pass approach (e.g. foldLeft) to improve readability and reduce side effects.

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexService.scala (1)

57-88: Batch processing
Removals and additions are done sequentially. If performance problems arise with large sets of segments, consider a bulk or parallel approach.

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingService.scala (1)

111-139: Good use of VolumeBucketBuffer for performance optimization

The introduction of volumeBucketBuffer and the prefill operation aligns with the PR's performance objectives. Batching the bucket operations should significantly improve performance when dealing with multiple buckets, especially with fallback layers.

One observation - the Fox.successful(()) at line 111 seems to be just a placeholder. Consider if this is necessary or if it could be removed for clarity.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3b7b4af and 21bfbec.

📒 Files selected for processing (8)
  • util/src/main/scala/com/scalableminds/util/tools/BoxImplicits.scala (1 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/FossilDBClient.scala (3 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeBucketBuffer.scala (1 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexBuffer.scala (2 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexService.scala (3 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingBucketHelper.scala (6 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingLayer.scala (4 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingService.scala (20 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingLayer.scala
🧰 Additional context used
🧬 Code Definitions (4)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeBucketBuffer.scala (2)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/FossilDBClient.scala (4)
  • FossilDBClient (44-246)
  • version (41-41)
  • get (108-117)
  • put (198-213)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingBucketHelper.scala (2)
  • loadBuckets (175-196)
  • saveBuckets (339-354)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingBucketHelper.scala (4)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingService.scala (1)
  • data (492-509)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/TemporaryTracingService.scala (1)
  • isTemporaryTracing (100-103)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingLayer.scala (1)
  • lengthOfUnderlyingCubes (96-96)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeBucketBuffer.scala (1)
  • put (60-63)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingService.scala (3)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexService.scala (2)
  • updateFromBucket (57-89)
  • getSegmentToBucketIndex (138-167)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeBucketBuffer.scala (2)
  • flush (63-72)
  • getWithFallback (32-38)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingLayer.scala (7)
  • bucketStreamWithVersion (41-43)
  • bucketStreamWithVersion (61-63)
  • bucketProvider (106-111)
  • bucketProvider (111-111)
  • bucketStream (38-41)
  • bucketStream (58-61)
  • bucketStream (119-119)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexService.scala (1)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexBuffer.scala (3)
  • getOne (80-90)
  • put (59-69)
  • getMultiple (90-119)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build
🔇 Additional comments (49)
util/src/main/scala/com/scalableminds/util/tools/BoxImplicits.scala (1)

27-36: Implementation looks good and follows best practices.

The assertNoFailure method efficiently validates that none of the boxes in the sequence are failures, stopping at the first occurrence. The error message is informative by including the count of boxes and the specific failure encountered. This will be useful for validation and error handling in the broader context of volume and segment index processing mentioned in the PR objectives.

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeBucketBuffer.scala (4)

1-21: Well-structured class definition with appropriate dependencies

The class is properly defined with all necessary dependencies for managing volume bucket data. It extends VolumeTracingBucketHelper which provides core bucket functionality and mixes in ProtoGeometryImplicits for geometry-related conversions. The class aligns well with the PR objective of improving performance for volume and segment index handling.


23-25: Consider thread safety for mutable buffer

The implementation uses a mutable HashMap without synchronization, which could lead to race conditions if accessed concurrently. While the execution context suggests asynchronous operations, it's worth ensuring thread safety is managed properly at usage sites.

Also, why is the buffer defined as lazy val rather than a regular val? Is there a specific reason for delaying initialization?


38-59: Efficient implementation of fallback layer handling

The methods for retrieving bucket data are well-designed, with proper error handling through Box types. The batching of requests in getMultipleFromFossilOrFallbackLayer is a good optimization that aligns with the PR's performance improvement objectives. The code also properly handles failures and empty results.


60-72: Good implementation of buffer update and flush functionality

The put and flush methods efficiently handle updating and persisting bucket data. The flush method smartly filters only the dirty buckets with valid data, which minimizes unnecessary operations. This implementation should help achieve the performance improvements mentioned in the PR objectives.

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/FossilDBClient.scala (7)

97-97: String interpolation simplified

The error message formatting has been simplified by removing unnecessary curly braces around the messageWithCauses variable.


153-159: Well-structured batch processing implementation

The new getMultipleKeysByList method effectively replaces getMultipleVersions with an improved implementation that processes keys in batches, which should reduce load on FossilDB and improve performance. This aligns well with the PR's performance improvement objectives.


160-180: Comprehensive error handling in implementation

The private implementation method handles various response cases well, including successful responses, error messages, and empty responses. The parsing logic is thorough and maintains type safety.


210-213: Error message formatting improved

The error message has been clarified and made more consistent with other error messages in the class.


215-219: Efficient multi-put implementation with batching

The new putMultiple method implements batch processing for multiple key-value pairs, which aligns with the PR objective of using "fossilDB multi-put functionalities for bucket updates." The batching approach should significantly improve performance when updating many buckets simultaneously.


220-239: Consistent error handling pattern maintained

The implementation follows the established pattern of separating the RPC call from error handling, allowing for proper slack notification while maintaining the Fox monad chain. This is consistent with the explanation provided in the past review comment.


153-239:

❓ Verification inconclusive

Verify fallback layer functionality with new multi-get/put methods

The PR objectives mention fixing an issue where "segment index did not correctly query the fallback layer during volume bucket updates." Let's verify that the new batch methods properly handle fallback layers.


🏁 Script executed:

#!/bin/bash
# Check if there are tests that verify fallback layer functionality with the new methods

echo "Searching for tests related to fallback layer functionality..."
rg -A 5 -B 5 "fallback.*getMultipleKeysByList|getMultipleKeysByList.*fallback"

echo "Checking for usages of the new methods with fallback layers..."
rg -A 3 "getMultipleKeysByList.*fallback|putMultiple.*fallback"

echo "Checking for volume bucket update implementations that use these methods..."
rg -A 5 "VolumeBucket.*update.*getMultipleKeysByList|VolumeBucketBuffer"

Length of output: 3695


Ensure Fallback Layer Is Properly Queried in Batch Methods

  • The new multi-get (getMultipleKeysByList) and multi-put (putMultiple) methods are meant to support fallback behavior during volume bucket updates, yet no explicit tests or references to fallback scenarios were found in the current code.
  • Searches in the codebase (e.g., in VolumeBucketBuffer and VolumeTracingService) show usage of these methods, but they don’t explicitly verify that fallback querying is triggered when needed.
  • Please add or update test cases that simulate fallback conditions to confirm that the new methods correctly handle the fallback layer—or clarify how the fallback behavior is implicitly managed if that’s the intended design.
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexBuffer.scala (12)

6-19: Imports look consistent
The newly added imports appear necessary for the code’s functionality and seem properly scoped.


38-50: Consider concurrency or synchronization
These constructor parameters instantiate a mutable.Map in the class body. If multiple threads may access or modify the buffer concurrently, a concurrency risk arises. Unless usage is guaranteed to be single-threaded, consider enforcing thread-safety (e.g., synchronization or a thread-safe collection).

Do you want me to generate a script to search for concurrent usage references to VolumeSegmentIndexBuffer?


56-57: Already covered in previous comment
The concurrency considerations mentioned above also apply here, as this mutable map stores the buffer contents.


62-66: Potential overwrite concern
This method overwrites any existing bucket positions for the given key. If you intend to merge new positions with existing ones, ensure that’s handled before calling put.

Would you like a script to check if calling code might inadvertently discard existing data?


69-78: Batch put logic
Similar overwrite concern as in put. If incremental merges are required, retrieve existing data or confirm overwrites are intended.


80-88: Straightforward single-lookup method
Wrapping getMultiple and extracting the head is concise and handles empty results safely with headOption.map(_._2).toFox.


119-122: Concurrent flush risk
flush() writes buffered updates to storage. If multiple threads call this simultaneously, changes may overwrite each other in the backing store. Validate single-threaded usage or add synchronization.

Do you want me to generate a script to locate concurrent flush() calls?


157-178: Sequential retrieval
Fox.serialCombined fetches data for each segment ID sequentially. A parallel or bulk approach might reduce latency for large segment sets if supported by volumeSegmentIndexClient.


180-194: Temporary store retrieval
Mirrors the fossil retrieval logic. If performance becomes critical, consider a multi-get approach similar to the above.


196-212: Fallback retrieval logic
This structured fallback approach is clearly delineated and consistent with the rest of the code.


213-215: Empty bucket array
Provides a convenient zero-filled array for missing data. Straightforward and clear.


216-222: Handling empty fallback
Mapping Empty to a zero-filled array is a concise solution to store constraints.

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingBucketHelper.scala (12)

4-17: New imports
Necessary additions for box handling, decompression, and concurrency utilities.


33-43: Compression logic
Method properly checks whether data is already uncompressed before applying LZ4 compression.


44-60: Robust decompression
Wraps LZ4 decompression in try/catch to handle errors gracefully.


80-90: Refined bucket key generation
Using volumeTracingId in place of dataLayerName clarifies intent and scope.


92-93: Simplified key prefix
Replacing the old prefix approach with volumeTracingId is more direct and cohesive.


175-196: Bulk bucket loading
Collecting fossil data, merging fallback data, and decompressing in one path is well structured.


198-236: Efficient fallback data handling
addFallbackBucketData neatly identifies empty buckets and fills them from the fallback store. The indexing logic is clear.


238-257: splitIntoBuckets length check
Properly safeguards against partial data returns. Throws an exception for mismatches, which prevents silent truncation.


259-289: Single bucket load with fallback
Nicely falls back if the primary store returns empty. This keeps the code DRY compared to re-implementing loading logic.


310-322: saveBucket compression
Reuses the compression step consistently when saving to temporary or fossil store.


323-354: Bulk bucket saving
Parallel to loadBuckets, enabling multiple buckets to be stored at once.


356-383: Iterators for streaming
Provides efficient streaming in multiple scenarios (with and without version, from temporary store).

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexService.scala (6)

10-13: Imports
These new imports for Vec3IntProto, NativeBucketScanner, and BucketPosition support the updated segment index logic.


23-23: Liftweb Box usage
Already used throughout the code; consistent approach to handling optional data and errors.


95-107: Removing bucket references
Fetching existing positions, subtracting this bucket, and storing the result is straightforward.


114-128: Adding new bucket references
Retrieves multiple prior positions in one call, then updates them to include the new bucket. Good for avoiding repeated lookups.


130-137: Native bucket scanning
Lazy loading the NativeBucketScanner and transforming the results into a Set[Long] is concise.


138-166: Segment-to-bucket index retrieval
Constructing a read-only buffer to fetch bucket positions ensures no accidental modifications.

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingService.scala (7)

90-101: Improved method signature by using volumeLayer parameter

The method now takes a volumeLayer parameter instead of separate mappingName and elementClass parameters. This is a good refactoring as it encapsulates related data, simplifies the interface, and promotes consistency with other methods.


158-181: Streamlined bucket update process

The method now properly uses the volumeBucketBuffer to manage bucket data and check prior state, which aligns with the performance improvement goals.

The change from direct storage to using volumeBucketBuffer.put at line 180 enables the buffer to accumulate changes and optimize them (likely using fossilDB multi-put functionality mentioned in the PR objectives).


199-222: Improved segment index handling with Sets

The change to process bucket positions as a Set (line 222) rather than a List aligns with the PR objective of "employing a set for bucket positions to reduce conversions."


127-135: Ensure buffer is properly flushed

Good implementation of the buffer flush operation at line 153. This ensures that all changes accumulated in the volumeBucketBuffer are properly committed at the end of the operation.

Also applies to: 153-153


287-287: Consider batching loadBucket operations

The loadBucket operations could potentially be batched for additional performance improvements.

As noted in the previous review comment by MichaelBuessemeyer, batching the loadBucket operations into a single request for all missing buckets could further improve performance, though I understand this is not a high priority since reverts are relatively rare.


136-139: Effective prefilling strategy for performance

The prefill operation selectively extracts bucket positions from UpdateBucketVolumeAction instances, which is an efficient approach to prepare the buffer before processing individual actions.


762-767: Consistent use of volumeLayer throughout merge operations

The refactoring to use volumeLayer consistently across merge operations improves code clarity and maintainability while enabling the performance optimizations targeted by this PR.

Also applies to: 795-802

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexBuffer.scala (1)

90-117: Efficient implementation for retrieving multiple segment indices

The new getMultiple method properly handles the retrieval sequence (buffer → fossilDB/temporary → datastore) and batches operations for better performance, which is in line with the PR objectives of optimizing loading multiple volume buckets.

However, line 109 could be optimized further by using a Map directly instead of combining collections.

-        allHits = fromBufferHits ++ fromFossilOrTempHits ++ fromDatastoreHits
+        allHits = (fromBufferHits ++ fromFossilOrTempHits ++ fromDatastoreHits).toMap

This would make the lookups in the following block more efficient, as noted in a previous code review.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5292a33 and 2b71386.

📒 Files selected for processing (2)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/SegmentStatistics.scala (4 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexBuffer.scala (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/SegmentStatistics.scala
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build
🔇 Additional comments (12)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexBuffer.scala (12)

38-50: Constructor implementation aligns with performance objectives

The updated constructor with new parameters elementClass, mappingName, and isReadOnly supports the PR's goal of improving performance for volume and segment index handling. The isReadOnly parameter replaces the previously reviewed isReaderOnly parameter.


56-57: Good optimization: Changed data structure for better performance

Converting from ListOfVec3IntProto to (Set[Vec3IntProto], Boolean) allows for:

  1. More efficient set operations (union, intersection)
  2. Tracking changed bucket positions via the boolean flag
  3. Selective flushing of only modified entries

This change aligns with the PR objectives of improving performance for bucket updates.


59-67: Enhanced safety with isReadOnly check

The updated put method now correctly prevents modifications when the buffer is in read-only mode and accepts sets instead of lists for bucket positions, which aligns with the performance improvements mentioned in the PR objectives.


69-78: New batch operation method improves efficiency

The putMultiple method supports efficient batch operations, which is more performant than calling put multiple times. This aligns with the PR objective of using fossilDB multi-put functionality for bucket updates.


80-88: Clean API design with separate getOne method

The new getOne method provides a clean API for single-item retrieval while reusing the batch operation logic from getMultiple. This promotes code reuse and maintainability.


119-134: Optimized flush operation with selective updates

The improved flush method now:

  1. Checks the isReadOnly flag to prevent invalid operations
  2. Only flushes entries marked as changed
  3. Uses putMultiple for fossilDB, which is more efficient than individual puts

This directly supports the PR objective of using fossilDB multi-put functionality for bucket updates.


136-155: Efficient buffer lookup implementation

The method now correctly returns a tuple of hits and misses, with an early exit path for read-only mode. The hits are stored in a map for efficient lookup, which aligns with the performance improvements mentioned in the PR objectives.


157-173: Effective use of batch operations for FossilDB queries

The implementation now uses getMultipleKeysByList with batch size control, which aligns with the PR objective of using fossilDB multi-get functionality for improved performance.


175-189: Consistent implementation for temporary store lookups

This method follows the same pattern as the other retrieval methods, maintaining consistency in the codebase and correctly handling the temporary store case.


191-206: Optimized datastore query for multiple segments

The method now uses querySegmentIndexForMultipleSegments to batch requests, which aligns with the PR objective of optimizing loading multiple volume buckets.


208-210: Memory-efficient empty bucket handling

The lazy val ensures the empty bucket array is created only once and reused, which is more efficient than creating a new array each time it's needed.


211-216: Well-implemented fallback mechanism

The bytesWithEmptyFallback method properly handles all possible states of the input box (Empty, Full, Failure), providing a clean fallback mechanism when no data is available. This helps resolve the issue mentioned in the PR objectives where "segment index did not correctly query the fallback layer".

@fm3
Copy link
Member Author

fm3 commented Apr 2, 2025

@MichaelBuessemeyer thanks for your detailed review! I implemented most of the suggestions.
I’m still unsure about the Set vs List discussion, let’s maybe talk about that again.
I also pushed a new fix for the editable mapping versioning problem mentioned above.

I think it’s a good time to do some testing, and if you want you can also already have another look at the code :)

@fm3 fm3 requested a review from MichaelBuessemeyer April 2, 2025 12:28
Copy link
Contributor

@MichaelBuessemeyer MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for all these improvements 🎉

I just re-reviewed the lastest changes and the lgtm 🧗

Testing also went well 🍏

…ngstore/tracings/volume/VolumeSegmentIndexBuffer.scala

Co-authored-by: MichaelBuessemeyer <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8bc345e and ab98fc4.

📒 Files selected for processing (1)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexBuffer.scala (2 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexBuffer.scala (4)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/TemporaryTracingService.scala (3)
  • TemporaryTracingService (18-114)
  • saveVolumeSegmentIndexBuffer (90-95)
  • getVolumeSegmentIndexBufferForKey (59-62)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeBucketBuffer.scala (2)
  • put (60-63)
  • flush (63-72)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingService.scala (1)
  • editableMappingTracingId (183-186)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/TemporaryStore.scala (2)
  • getMultiple (28-33)
  • get (13-18)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build
🔇 Additional comments (8)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexBuffer.scala (8)

109-115: Consider using a Map-based approach to improve performance.

The current implementation iterates over all segment IDs repeatedly to check if they exist in the combined map. Using a map lookup would be more efficient when working with large datasets.

-        allHits = fromBufferHits ++ fromFossilOrTempHits ++ fromDatastoreHits
-        allHitsFilled = segmentIds.map { segmentId =>
-          allHits.get(segmentId) match {
-            case Some(positions) => (segmentId, positions)
-            case None            => (segmentId, Set[Vec3IntProto]())
-          }
-        }
+        allHits = fromBufferHits ++ fromFossilOrTempHits ++ fromDatastoreHits
+        allHitsFilled = segmentIds.map(segmentId => 
+          (segmentId, allHits.getOrElse(segmentId, Set.empty[Vec3IntProto]))
+        )

99-101: Move Fox.successful outside of for-comprehension.

The current code structure unnecessarily wraps the result in a Fox and then extracts it with <-. Moving the conditional logic outside the for-comprehension would be more efficient.

-        (fromFossilOrTempHits, fromFossilOrTempMisses) <- if (toTemporaryStore)
-          Fox.successful(getMultipleFromTemporaryStoreNoteMisses(fromBufferMisses, mag, additionalCoordinates))
-        else getMultipleFromFossilNoteMisses(fromBufferMisses, mag, additionalCoordinates)
+        (fromFossilOrTempHits, fromFossilOrTempMisses) <- {
+          if (toTemporaryStore)
+            Fox.successful(getMultipleFromTemporaryStoreNoteMisses(fromBufferMisses, mag, additionalCoordinates))
+          else 
+            getMultipleFromFossilNoteMisses(fromBufferMisses, mag, additionalCoordinates)
+        }

166-167: Good use of batch processing to optimize performance.

Using getMultipleKeysByList with a batch size parameter is an excellent performance optimization for handling many segment IDs efficiently, helping to reduce the number of database queries.


208-210: Good optimization using lazy val for empty bucket array.

Lazily initializing the empty bucket array is an efficient approach since it's only created when needed and then reused, reducing unnecessary object creation.


211-216: Well-designed fallback mechanism for empty data.

The bytesWithEmptyFallback method provides a clean way to handle empty data by returning a properly initialized empty array instead of failing, improving robustness.


69-78: Good implementation of batch update capability.

The putMultiple method aligns with the PR objectives of improving performance for bucket updates by enabling batch operations, which reduces redundant processing when multiple segments need updating.


39-50: Constructor changes align well with architecture improvements.

The additional parameters (elementClass, mappingName, isReadOnly) support the performance enhancements mentioned in the PR objectives, particularly for enabling optimized segment index operations and appropriate fallback handling.


56-57: Good data structure change from List to Set.

Changing the collection type to Set for bucket positions offers performance benefits for lookups and eliminates duplicates, which aligns with the PR's goal of performance optimization.

fm3 and others added 2 commits April 2, 2025 18:01
…ngstore/tracings/volume/VolumeSegmentIndexBuffer.scala

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
MIGRATIONS.unreleased.md (1)

11-11: Fix Markdown Unordered List Indentation

The markdownlint tool reports an indentation issue on line 11. The unordered list item is indented by one space while the expected indentation is 0 spaces. Please remove the leading whitespace to comply with MD007.

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

11-11: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a2bd486 and a1ceb36.

📒 Files selected for processing (2)
  • CHANGELOG.unreleased.md (2 hunks)
  • MIGRATIONS.unreleased.md (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • CHANGELOG.unreleased.md
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
MIGRATIONS.unreleased.md

11-11: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build
🔇 Additional comments (1)
MIGRATIONS.unreleased.md (1)

11-12: Validate FossilDB Version Requirement Update

The migration guide now states that FossilDB version 0.1.37 (noted as master__525: on dockerhub) is required, as per PR [#8460]. This change is clearly documented here. Please ensure this new requirement is reflected in all related configuration files and deployment scripts.

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

11-11: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants