Skip to content

Adding attrs arg to create_group and create_dataset #121

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jakirkham opened this issue Feb 22, 2017 · 11 comments
Closed

Adding attrs arg to create_group and create_dataset #121

jakirkham opened this issue Feb 22, 2017 · 11 comments
Assignees
Labels
enhancement New features or improvements release notes done Automatically applied to PRs which have release notes.
Milestone

Comments

@jakirkham
Copy link
Member

As it stands now, when one writes a group or dataset, one must write the attributes as a separate operation. This results in creation of .zattrs once. Then if one update the attributes, it results in writing .zattrs again. With something like a ZipStore backed Zarr Group, one gets a warning on writing .zattrs the second time, which AFAICT is unavoidable unless someone sets the attributes correctly the first time. So if we include an argument for attrs to create_group and create_dataset, we should be able to avoid this warning by setting the attributes during creation of these Zarr objects.

@alimanfoo
Copy link
Member

alimanfoo commented Feb 22, 2017 via email

@alimanfoo alimanfoo mentioned this issue Apr 6, 2017
@jakirkham
Copy link
Member Author

Would also add open_group to this list. Though I guess this function raises a question about how to handle cases where the group already exists and what to do if these attrs match or don't.

@alimanfoo
Copy link
Member

alimanfoo commented Oct 18, 2017 via email

@jakirkham
Copy link
Member Author

If one makes any changes to the attributes, it will be added as a new entry in the Zip file, which is exactly what I would like to avoid.

@alimanfoo
Copy link
Member

alimanfoo commented Oct 19, 2017 via email

@jakirkham
Copy link
Member Author

Maybe that would be ok if we check that they don't differ first. Wouldn't want to lose information here. Alternatively we could just start adding copy_from_* style-methods, which would be generally useful and maybe less kludgy than my original suggestion.

In any event, had to switch to using the zipfile module directly to get tighter control on memory consumption and squeeze out every last bit of efficiency possible. As such this is not a pressing feature for me personally (though may be nice for others). Am happy to let it sit or close as deemed appropriate.

@alimanfoo alimanfoo added this to the v2.2 milestone Nov 19, 2017
@jakirkham
Copy link
Member Author

Honestly I'd be ok closing this one. While attrs certainly could be an optional argument to functions creating datasets and groups, it diverges from h5py a bit. I think the problem that I had can be solved better ways. Not to mention, the copy suggestion I made above ends up being the same as issue ( https://github.com/alimanfoo/zarr/issues/113 ). Having copy would be of significantly more utility than optionally adding attrs. Thoughts?

@alimanfoo alimanfoo added the enhancement New features or improvements label Nov 21, 2017
@alimanfoo
Copy link
Member

Thinking about this some more, I actually think the current behaviour is a bit pointless: when an array or group is created, an empty dict is written to the .zattrs key, but there is no need to write .zattrs at all unless the user actually wants to store some attributes. I'm tempted to change this so that .zattrs is only written when the user actually wants to store some attributes. This would obviate the need for an 'attrs' argument to array or group creation, because, e.g.:

g = zarr.group()
g.attrs.update(foo='bar', baz='qux')

...would then imply only a single write to .zattrs.

@jakirkham
Copy link
Member Author

Yeah I like that idea better as well. Does it mean a change in the spec though? Not sure if .zattrs was guaranteed to always be there or not.

@alimanfoo
Copy link
Member

alimanfoo commented Nov 24, 2017 via email

@alimanfoo alimanfoo self-assigned this Nov 24, 2017
@alimanfoo alimanfoo added the in progress Someone is currently working on this label Nov 24, 2017
@alimanfoo
Copy link
Member

I've gone ahead and implemented this in #200, everything works fine and should be fully backwards compatible with code and data, so seems like a good thing to me.

@alimanfoo alimanfoo added release notes done Automatically applied to PRs which have release notes. and removed in progress Someone is currently working on this labels Nov 24, 2017
@d-v-b d-v-b mentioned this issue Feb 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New features or improvements release notes done Automatically applied to PRs which have release notes.
Projects
None yet
Development

No branches or pull requests

2 participants