-
Notifications
You must be signed in to change notification settings - Fork 29
Refactor default blosc settings, reducing code duplication #8904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughIntroduces BloscCodec.fromConfiguration and BloscCodecConfiguration.defaultForWKZarrOutput, reduces visibility of a helper, updates Zarr3Array and tracingstore components to obtain compressors and Zarr headers from the centralized Blosc codec configuration, plus a minor syntactic cleanup in CodecTreeExplorer.findOne. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (5 passed)
✨ Finishing touches🧪 Generate unit tests
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Codecs.scala (2)
255-263
: Default is fine; consider dtype-aware typesize for compression efficiency.
typesize = 1
is safe but suboptimal for e.g. uint64 (better8
). Optional: add helpers or override per array when known, keeping the default unchanged.Example helper:
object BloscCodecConfiguration { … + def withTypesize(ts: Int): BloscCodecConfiguration = + defaultForWKZarrOutput.copy(typesize = Some(ts)) }If you trial this, benchmark on uint64 vs bool chunks to validate ratio and CPU impact.
340-346
: Brace cleanup looks good; consider searching index codecs too.No behavior change here. If
findOne
is expected to locate configs insideindex_codecs
of sharding, consider recursing there as well (non-blocking, just a note).webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/Zarr3BucketStreamSink.scala (1)
115-115
: Compressor derived from the same configuration — consistent.This keeps the written chunks in sync with the header. As a future-proofing optional tweak, consider deriving the compressor from the just-built header instead of referencing the default constant directly (guards against accidental drift if defaults change).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Codecs.scala
(3 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Zarr3Array.scala
(1 hunks)webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingIOService.scala
(3 hunks)webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeDataZipHelper.scala
(2 hunks)webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/Zarr3BucketStreamSink.scala
(3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-04-23T08:51:57.756Z
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala:170-173
Timestamp: 2025-04-23T08:51:57.756Z
Learning: In the webknossos codebase, classes extending `FoxImplicits` have access to an implicit conversion from `Option[A]` to `Fox[A]`, where `None` is converted to an empty Fox that fails gracefully in for-comprehensions.
Applied to files:
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingIOService.scala
🧬 Code graph analysis (5)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeDataZipHelper.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Codecs.scala (5)
BloscCodec
(103-131)BloscCodec
(133-140)BloscCodecConfiguration
(235-242)BloscCodecConfiguration
(244-263)fromConfiguration
(134-140)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Zarr3Array.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Codecs.scala (3)
BloscCodec
(103-131)BloscCodec
(133-140)fromConfiguration
(134-140)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingIOService.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Codecs.scala (5)
BloscCodecConfiguration
(235-242)BloscCodecConfiguration
(244-263)BloscCodec
(103-131)BloscCodec
(133-140)fromConfiguration
(134-140)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Codecs.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/Compressor.scala (3)
BloscCompressor
(185-197)BloscCompressor
(199-291)StringCompressionSetting
(23-23)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/Zarr3BucketStreamSink.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/AxisOrder.scala (2)
AxisOrder
(8-23)AxisOrder
(25-46)webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Codecs.scala (5)
BloscCodecConfiguration
(235-242)BloscCodecConfiguration
(244-263)BloscCodec
(103-131)BloscCodec
(133-140)fromConfiguration
(134-140)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-smoketest-push
- GitHub Check: backend-tests
🔇 Additional comments (10)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Codecs.scala (2)
133-140
: Factory method is a solid consolidation — LGTM.Centralizing Blosc construction via
fromConfiguration
removes duplication and reduces drift risk.
248-254
: Throw IllegalArgumentException for invalid shuffle codes
Replace the wildcard case in shuffleSettingFromInt with:- case _ => ??? + case _ => throw new IllegalArgumentException(s"Invalid blosc shuffle code: $shuffle")No external callers remain—the method is private and only used within Codecs.scala.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Zarr3Array.scala (1)
72-78
: Switch toBloscCodec.fromConfiguration
— LGTM.Cleaner pattern match with a typed
BloscCodecConfiguration
and centralized construction reduces duplication.webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingIOService.scala (3)
47-50
: Using centralized WK Zarr default incodecs
— LGTM.This aligns headers with the canonical config and avoids drift.
68-71
: Consistent codecs foredgeIsAddition
— LGTM.Matches the edges array approach; consistent config across arrays.
112-113
: Centralized compressor wiring — verify thread-safety under concurrency.
private lazy val compressor
is likely shared; confirmBloscCompressor
is thread-safe in your usage. If not, instantiate per chunk or guard with a pool.webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/Zarr3BucketStreamSink.scala (3)
10-10
: Import cleanup is correct.Switching to the direct AxisOrder import matches its location and clarifies usage.
17-17
: ProtoGeometryImplicits import move is fine.No behavior change; keeps the extends clause intact.
38-41
: Single source of truth for Blosc config — good change.Using additionalCodecs = Seq(BloscCodecConfiguration.defaultForWKZarrOutput) de-duplicates settings and aligns header with the compressor.
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeDataZipHelper.scala (1)
8-8
: Import updates look good.Switch to codec-based API is consistent with the refactor.
private lazy val compressor = BloscCodec.fromConfiguration(BloscCodecConfiguration.defaultForWKZarrOutput).compressor | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Decompression should honor the header’s codec, not a hard-coded default.
Reading Zarr3 zips with non-default Blosc settings will fail or mis-decode if we always use the default compressor. Derive the codec from firstHeader (fallback to default if absent).
Apply this diff within withBucketsFromZarr3Zip and remove the class-level compressor:
@@
firstHeader <- JsonHelper.parseAs[Zarr3ArrayHeader](firstHeaderString).toFox
_ <- firstHeader.assertValid.toFox
+ // Prefer the codec specified in the header; fall back to WK default
+ val bloscCodec =
+ firstHeader.additionalCodecs
+ .getOrElse(Seq.empty)
+ .collectFirst { case b: BloscCodecConfiguration => BloscCodec.fromConfiguration(b) }
+ .getOrElse(BloscCodec.fromConfiguration(BloscCodecConfiguration.defaultForWKZarrOutput))
_ <- ZipIO.withUnzipedAsync(zipFile) {
case (filename, inputStream) =>
if (filename.endsWith(Zarr3ArrayHeader.FILENAME_ZARR_JSON)) Fox.successful(())
else {
parseZarrChunkPath(filename.toString, firstHeader).map { bucketPosition =>
val dataCompressed = IOUtils.toByteArray(inputStream)
- val data = compressor.decompress(dataCompressed)
+ val data = bloscCodec.decode(dataCompressed)
block(bucketPosition, data)
}.getOrElse(Fox.successful(()))
}
}
@@
- private lazy val compressor = BloscCodec.fromConfiguration(BloscCodecConfiguration.defaultForWKZarrOutput).compressor
+ // Removed: decoder is derived from the header inside withBucketsFromZarr3Zip
If headers can include multiple codecs, consider applying the full codec chain (decode in reverse order). I can draft that if desired.
🏁 Script executed:
#!/bin/bash
set -e
# Locate the helper file
file=$(fd --type f 'VolumeDataZipHelper.scala')
echo "File: $file"
# Show the class-level compressor definition
grep -n 'private lazy val compressor' "$file"
echo
# Find and display the withBucketsFromZarr3Zip method context
grep -n 'def withBucketsFromZarr3Zip' "$file"
line=$(grep -n 'def withBucketsFromZarr3Zip' "$file" | cut -d: -f1)
sed -n "$((line-5)),$((line+40))p" "$file"
Length of output: 2565
🏁 Script executed:
#!/bin/bash
set -e
# Locate Zarr3ArrayHeader definition
grep -R "case class Zarr3ArrayHeader" -n webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume
# Display the definition around that line
location=$(grep -R "case class Zarr3ArrayHeader" -n webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume | cut -d: -f1)
sed -n "$((location-3)),$((location+20))p" webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/$(basename $(grep -R "case class Zarr3ArrayHeader" -n webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume | cut -d: -f1))
Length of output: 0
🏁 Script executed:
#!/bin/bash
set -e
# Search for the Zarr3ArrayHeader definition
rg -n --type scala 'case class Zarr3ArrayHeader' -C3 webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume
Length of output: 0
🏁 Script executed:
#!/bin/bash
set -e
# Find occurrences of Zarr3ArrayHeader
rg -n "Zarr3ArrayHeader" -C3 .
# Locate the file that defines the case class
header_file=$(rg -l "case class Zarr3ArrayHeader" -g '*.scala')
echo "Header definition file: $header_file"
# Show the definition of Zarr3ArrayHeader
sed -n '1,200p' "$header_file"
Length of output: 61527
Honor Zarr3 header’s Blosc configuration in decompression
In withBucketsFromZarr3Zip (VolumeDataZipHelper.scala 168–169), drop the top‐level compressor
, derive a local BloscCodec
from firstHeader.codecs.collectFirst { case b: BloscCodecConfiguration => BloscCodec.fromConfiguration(b) }
(fall back to defaultForWKZarrOutput
), and invoke its decode
(instead of compressor.decompress
) on each chunk.
🤖 Prompt for AI Agents
In
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeDataZipHelper.scala
around lines 168–169, stop using the module-level lazy val compressor and
instead derive a local BloscCodec from the Zarr3 header: use
firstHeader.codecs.collectFirst { case b: BloscCodecConfiguration =>
BloscCodec.fromConfiguration(b) } with a fallback to
BloscCodec.fromConfiguration(BloscCodecConfiguration.defaultForWKZarrOutput);
then replace calls to compressor.decompress on each chunk with the
localCodec.decode (or equivalent decode method) so the decompression honors the
header’s Blosc configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good and is working. Thanks for refactoring this.
All 🟢
We had a number of repeats of this default blosc setting, some as compressor arguments, some as codec configuration. Now there is one, and the others are constructed from it where needed.
Steps to test:
Issues: