split up metadata

Currently the entire metadata tree is part of a single output object. Due to the 2MB size limit, this will quickly cause trouble for larger datasets. A natural approach would be to split the metadata object up into multiple objects based on the hierarchy inherent to zarr (e.g. one DAG-CBOR block per variable and dataset instead of *only* one object per dataset).

We might still need to resort to [HAMT](https://ipld.io/specs/advanced-data-layouts/hamt/) if the dictionaries on a single hierarchical level become too large, but that might still be quite far away. We probably might also want to introduce further hierarchical levels within the chunk keys of a single zarr variable as proposed [here](https://github.com/zarr-developers/zarr-python/issues/877#issuecomment-1030012408) demonstrated [here](https://gist.github.com/d70-t/52bc0ecfa0d8bffec3c0da620b03891f#file-test_renumbering-ipynb). This would reduce the number of items per dictionary while aligning IPLD objects to (to-be-introduced) zarr-shards, which in turn may lead to better locality within block requests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

split up metadata #7

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

split up metadata #7

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions