Skip to content

Conversation

fm3
Copy link
Member

@fm3 fm3 commented Mar 31, 2025

Measured duplicate on a volume annotation.
Duplicating 923 volume buckets took 827ms on master, 360ms here (factor 2.2).
Similar speedups are to be expected for the other usages.

URL of deployed dev instance (used for testing):

Steps to test:

  • Edit a skeleton annotation with multiple trees, reload, should work
  • Duplicate annotation, should work (skeleton, volume, editable mapping)
  • Merge annotations, should work (skeleton, volume, editable mapping)

Issues:


  • Removed dev-only changes like prints and application.conf edits
  • Considered common edge cases
  • Needs datastore update after deployment

Base automatically changed from segment-index-perf to master April 3, 2025 06:26
@fm3 fm3 marked this pull request as ready for review April 3, 2025 09:11
@fm3 fm3 requested a review from MichaelBuessemeyer April 3, 2025 09:17
Copy link
Contributor

@MichaelBuessemeyer MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me as well. Testing went also well 👍

Just dumping thoughts here to comment on:
Currently, the buffer blocks while flushing. But having a list of buffers in the buffer would enable us to fill a second buffer while on is being flushed to the fossilddb and then the second buffer could be filled. This might not be necessary, but might save some waiting time while the buffer is being flushed. But this would also increase the complexity of this feature

@fm3
Copy link
Member Author

fm3 commented Apr 7, 2025

Thanks for your review! You’re right, this might be a way to speed this up further but as you said, I’d rather avoid the complexity of having to ensure that everything is working correctly in parallel and correctly waiting for everything on the final flush.

Also, since we are serving many users in parallel, maximum parallelization could actually also be a downside. While it would serve to answer individual requests more quickly, it would potentially also use up many threads, CPU, and RPC requests, so that some (possibly smaller) requests by other users might get answered considerably later. This is also why I mostly still use serialCombined, to limit parallelization inside of individual user requests. An alternative would be to use custom thread pools for fine-grained control, but that would add another layer of complexity. I hope this also answers #8469 (comment)

@fm3 fm3 enabled auto-merge (squash) April 7, 2025 08:01
@fm3 fm3 merged commit 5ef0222 into master Apr 7, 2025
3 checks passed
@fm3 fm3 deleted the put-buffer branch April 7, 2025 08:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants