Skip to content

explicitly control whether top-level array / group access routines can create #3364

@d-v-b

Description

@d-v-b

I just got bitten by the following pattern:

# create zarr data at easy_typos/barr.zarr

import zarr

# attempt to open data at the wrong path because of a typo
zg = zarr.open_group("easy_typo/barr.zarr")

# spend a while trying to figure out why this group is empty, only to discover the typo

# open the correct group
zg = zarr.open_group("easy_typos/barr.zarr")

# clean up the extra group at "easy_typo/barr.zarr" that was created when i actually wanted to read
# "easy_typos/barr.zarr"

We should have a path for when you know data already exists and you are trying to access it. In this case, if the stuff you are trying to open does not exist, then zarr should error instead of silently creating stuff.

I see 2 concrete options, that could work together:

  1. a flag for whether a function like open_group can create new stuff. something like can_create: bool. This needs to be sensibly composed with the access modes.
  2. a function only for reading that will never create create arrays or groups, e.g. read_group, read_array, and a polymorphic read, which can return an array or a group, depending on what's there. I had a PR that added these way back when, but those particular functions were booted from my PR. This doesn't look like the right decision in hindsight. We should bring these functions back.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions