Skip to content

Conversation

fm3
Copy link
Member

@fm3 fm3 commented Sep 9, 2025

We had a number of repeats of this default blosc setting, some as compressor arguments, some as codec configuration. Now there is one, and the others are constructed from it where needed.

Steps to test:

  • Export volume annotation as zarr3, reupload should show the data.
  • Download on master, reupload on this branch, should still show the data
  • Download proofreading annotation, should still look the same (compare zarr.json)

Issues:


  • Removed dev-only changes like prints and application.conf edits
  • Considered common edge cases
  • Needs datastore update after deployment

@fm3 fm3 self-assigned this Sep 9, 2025
Copy link
Contributor

coderabbitai bot commented Sep 9, 2025

📝 Walkthrough

Walkthrough

Introduces BloscCodec.fromConfiguration and BloscCodecConfiguration.defaultForWKZarrOutput, reduces visibility of a helper, updates Zarr3Array and tracingstore components to obtain compressors and Zarr headers from the centralized Blosc codec configuration, plus a minor syntactic cleanup in CodecTreeExplorer.findOne.

Changes

Cohort / File(s) Summary of changes
Blosc codec factory & config
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Codecs.scala
Adds BloscCodec.fromConfiguration, adds BloscCodecConfiguration.defaultForWKZarrOutput, makes shuffleSettingFromInt private, and removes an extra brace in CodecTreeExplorer.findOne.
Zarr3 array initialization
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Zarr3Array.scala
Switches codec construction to BloscCodec.fromConfiguration(...), using typed pattern matching for BloscCodecConfiguration; no public API changes.
Tracingstore: compressor wiring & headers
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingIOService.scala, webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeDataZipHelper.scala, webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/Zarr3BucketStreamSink.scala
Replaces manual BloscCompressor and per-site setting maps with BloscCodec.fromConfiguration(BloscCodecConfiguration.defaultForWKZarrOutput).compressor; updates Zarr headers to include BloscCodecConfiguration.defaultForWKZarrOutput; removes legacy BloscCompressor-related imports and vals.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • MichaelBuessemeyer

Poem

I nibble on bytes with whiskers bright,
Swapped maps for codecs under moonlight 🥕,
One factory hop, defaults in place,
Headers aligned, compressors embrace.
Thump-thump — code runs, a joyful race.

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Title Check ✅ Passed The PR title "Refactor default blosc settings, reducing code duplication" succinctly and accurately captures the main change: centralizing default Blosc settings and updating callers to derive compressors/codecs from that single definition, as seen in the added BloscCodec.fromConfiguration and BloscCodecConfiguration.defaultForWKZarrOutput and the updates across multiple files. It is concise, relevant, and understandable for reviewers scanning the history.
Linked Issues Check ✅ Passed The linked issue (#8842) asked to keep compressor and compressorConfiguration synchronized, allow deriving one from the other, and use the same representation in both services; the PR adds BloscCodecConfiguration.defaultForWKZarrOutput and BloscCodec.fromConfiguration and updates Zarr3BucketStreamSink, EditableMappingIOService, Zarr3Array, and VolumeDataZipHelper to use them, which implements those coding objectives. The summaries show consistent adoption of the new configuration-driven approach across the relevant modules, satisfying the issue's requirements.
Out of Scope Changes Check ✅ Passed The changes are focused on consolidating Blosc configuration and switching consumers to the new codec factory; there are only minor internal adjustments (making shuffleSettingFromInt private and a small syntactic cleanup) and no unrelated feature work or spurious refactors evident in the summaries.
Description Check ✅ Passed The PR description directly describes the consolidation of default Blosc settings, lists concrete test steps, and references issue #8842, which aligns with the code changes that introduce a single canonical Blosc configuration and update services to use it; this meets the lenient requirement for relevance.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch refactor-default-blosc

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b5d6af2 and a858e06.

📒 Files selected for processing (1)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/Zarr3BucketStreamSink.scala (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/Zarr3BucketStreamSink.scala
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-smoketest-push
  • GitHub Check: frontend-tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@fm3 fm3 marked this pull request as ready for review September 9, 2025 09:51
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Codecs.scala (2)

255-263: Default is fine; consider dtype-aware typesize for compression efficiency.

typesize = 1 is safe but suboptimal for e.g. uint64 (better 8). Optional: add helpers or override per array when known, keeping the default unchanged.

Example helper:

 object BloscCodecConfiguration {
   …
+  def withTypesize(ts: Int): BloscCodecConfiguration =
+    defaultForWKZarrOutput.copy(typesize = Some(ts))
 }

If you trial this, benchmark on uint64 vs bool chunks to validate ratio and CPU impact.


340-346: Brace cleanup looks good; consider searching index codecs too.

No behavior change here. If findOne is expected to locate configs inside index_codecs of sharding, consider recursing there as well (non-blocking, just a note).

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/Zarr3BucketStreamSink.scala (1)

115-115: Compressor derived from the same configuration — consistent.

This keeps the written chunks in sync with the header. As a future-proofing optional tweak, consider deriving the compressor from the just-built header instead of referencing the default constant directly (guards against accidental drift if defaults change).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 00293e2 and b5d6af2.

📒 Files selected for processing (5)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Codecs.scala (3 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Zarr3Array.scala (1 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingIOService.scala (3 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeDataZipHelper.scala (2 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/Zarr3BucketStreamSink.scala (3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-04-23T08:51:57.756Z
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala:170-173
Timestamp: 2025-04-23T08:51:57.756Z
Learning: In the webknossos codebase, classes extending `FoxImplicits` have access to an implicit conversion from `Option[A]` to `Fox[A]`, where `None` is converted to an empty Fox that fails gracefully in for-comprehensions.

Applied to files:

  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingIOService.scala
🧬 Code graph analysis (5)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeDataZipHelper.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Codecs.scala (5)
  • BloscCodec (103-131)
  • BloscCodec (133-140)
  • BloscCodecConfiguration (235-242)
  • BloscCodecConfiguration (244-263)
  • fromConfiguration (134-140)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Zarr3Array.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Codecs.scala (3)
  • BloscCodec (103-131)
  • BloscCodec (133-140)
  • fromConfiguration (134-140)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingIOService.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Codecs.scala (5)
  • BloscCodecConfiguration (235-242)
  • BloscCodecConfiguration (244-263)
  • BloscCodec (103-131)
  • BloscCodec (133-140)
  • fromConfiguration (134-140)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Codecs.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/Compressor.scala (3)
  • BloscCompressor (185-197)
  • BloscCompressor (199-291)
  • StringCompressionSetting (23-23)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/Zarr3BucketStreamSink.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/AxisOrder.scala (2)
  • AxisOrder (8-23)
  • AxisOrder (25-46)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Codecs.scala (5)
  • BloscCodecConfiguration (235-242)
  • BloscCodecConfiguration (244-263)
  • BloscCodec (103-131)
  • BloscCodec (133-140)
  • fromConfiguration (134-140)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-smoketest-push
  • GitHub Check: backend-tests
🔇 Additional comments (10)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Codecs.scala (2)

133-140: Factory method is a solid consolidation — LGTM.

Centralizing Blosc construction via fromConfiguration removes duplication and reduces drift risk.


248-254: Throw IllegalArgumentException for invalid shuffle codes
Replace the wildcard case in shuffleSettingFromInt with:

-    case _ => ???
+    case _ => throw new IllegalArgumentException(s"Invalid blosc shuffle code: $shuffle")

No external callers remain—the method is private and only used within Codecs.scala.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Zarr3Array.scala (1)

72-78: Switch to BloscCodec.fromConfiguration — LGTM.

Cleaner pattern match with a typed BloscCodecConfiguration and centralized construction reduces duplication.

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingIOService.scala (3)

47-50: Using centralized WK Zarr default in codecs — LGTM.

This aligns headers with the canonical config and avoids drift.


68-71: Consistent codecs for edgeIsAddition — LGTM.

Matches the edges array approach; consistent config across arrays.


112-113: Centralized compressor wiring — verify thread-safety under concurrency.

private lazy val compressor is likely shared; confirm BloscCompressor is thread-safe in your usage. If not, instantiate per chunk or guard with a pool.

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/Zarr3BucketStreamSink.scala (3)

10-10: Import cleanup is correct.

Switching to the direct AxisOrder import matches its location and clarifies usage.


17-17: ProtoGeometryImplicits import move is fine.

No behavior change; keeps the extends clause intact.


38-41: Single source of truth for Blosc config — good change.

Using additionalCodecs = Seq(BloscCodecConfiguration.defaultForWKZarrOutput) de-duplicates settings and aligns header with the compressor.

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeDataZipHelper.scala (1)

8-8: Import updates look good.

Switch to codec-based API is consistent with the refactor.

Comment on lines +168 to +169
private lazy val compressor = BloscCodec.fromConfiguration(BloscCodecConfiguration.defaultForWKZarrOutput).compressor

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Decompression should honor the header’s codec, not a hard-coded default.

Reading Zarr3 zips with non-default Blosc settings will fail or mis-decode if we always use the default compressor. Derive the codec from firstHeader (fallback to default if absent).

Apply this diff within withBucketsFromZarr3Zip and remove the class-level compressor:

@@
       firstHeader <- JsonHelper.parseAs[Zarr3ArrayHeader](firstHeaderString).toFox
       _ <- firstHeader.assertValid.toFox
+      // Prefer the codec specified in the header; fall back to WK default
+      val bloscCodec =
+        firstHeader.additionalCodecs
+          .getOrElse(Seq.empty)
+          .collectFirst { case b: BloscCodecConfiguration => BloscCodec.fromConfiguration(b) }
+          .getOrElse(BloscCodec.fromConfiguration(BloscCodecConfiguration.defaultForWKZarrOutput))
       _ <- ZipIO.withUnzipedAsync(zipFile) {
         case (filename, inputStream) =>
           if (filename.endsWith(Zarr3ArrayHeader.FILENAME_ZARR_JSON)) Fox.successful(())
           else {
             parseZarrChunkPath(filename.toString, firstHeader).map { bucketPosition =>
               val dataCompressed = IOUtils.toByteArray(inputStream)
-              val data = compressor.decompress(dataCompressed)
+              val data = bloscCodec.decode(dataCompressed)
               block(bucketPosition, data)
             }.getOrElse(Fox.successful(()))
           }
       }
@@
-  private lazy val compressor = BloscCodec.fromConfiguration(BloscCodecConfiguration.defaultForWKZarrOutput).compressor
+  // Removed: decoder is derived from the header inside withBucketsFromZarr3Zip

If headers can include multiple codecs, consider applying the full codec chain (decode in reverse order). I can draft that if desired.


🏁 Script executed:

#!/bin/bash
set -e
# Locate the helper file
file=$(fd --type f 'VolumeDataZipHelper.scala')
echo "File: $file"
# Show the class-level compressor definition
grep -n 'private lazy val compressor' "$file"
echo
# Find and display the withBucketsFromZarr3Zip method context
grep -n 'def withBucketsFromZarr3Zip' "$file"
line=$(grep -n 'def withBucketsFromZarr3Zip' "$file" | cut -d: -f1)
sed -n "$((line-5)),$((line+40))p" "$file"

Length of output: 2565


🏁 Script executed:

#!/bin/bash
set -e
# Locate Zarr3ArrayHeader definition
grep -R "case class Zarr3ArrayHeader" -n webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume
# Display the definition around that line
location=$(grep -R "case class Zarr3ArrayHeader" -n webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume | cut -d: -f1)
sed -n "$((location-3)),$((location+20))p" webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/$(basename $(grep -R "case class Zarr3ArrayHeader" -n webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume | cut -d: -f1))

Length of output: 0


🏁 Script executed:

#!/bin/bash
set -e
# Search for the Zarr3ArrayHeader definition
rg -n --type scala 'case class Zarr3ArrayHeader' -C3 webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume

Length of output: 0


🏁 Script executed:

#!/bin/bash
set -e
# Find occurrences of Zarr3ArrayHeader
rg -n "Zarr3ArrayHeader" -C3 .

# Locate the file that defines the case class
header_file=$(rg -l "case class Zarr3ArrayHeader" -g '*.scala')
echo "Header definition file: $header_file"

# Show the definition of Zarr3ArrayHeader
sed -n '1,200p' "$header_file"

Length of output: 61527


Honor Zarr3 header’s Blosc configuration in decompression
In withBucketsFromZarr3Zip (VolumeDataZipHelper.scala 168–169), drop the top‐level compressor, derive a local BloscCodec from firstHeader.codecs.collectFirst { case b: BloscCodecConfiguration => BloscCodec.fromConfiguration(b) } (fall back to defaultForWKZarrOutput), and invoke its decode (instead of compressor.decompress) on each chunk.

🤖 Prompt for AI Agents
In
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeDataZipHelper.scala
around lines 168–169, stop using the module-level lazy val compressor and
instead derive a local BloscCodec from the Zarr3 header: use
firstHeader.codecs.collectFirst { case b: BloscCodecConfiguration =>
BloscCodec.fromConfiguration(b) } with a fallback to
BloscCodec.fromConfiguration(BloscCodecConfiguration.defaultForWKZarrOutput);
then replace calls to compressor.decompress on each chunk with the
localCodec.decode (or equivalent decode method) so the decompression honors the
header’s Blosc configuration.

Copy link
Contributor

@MichaelBuessemeyer MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and is working. Thanks for refactoring this.

All 🟢

@fm3 fm3 enabled auto-merge (squash) September 23, 2025 07:53
@fm3 fm3 merged commit e1772b2 into master Sep 23, 2025
5 checks passed
@fm3 fm3 deleted the refactor-default-blosc branch September 23, 2025 08:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor compressorConfiguration vs compressor specifications
2 participants