-
-
Notifications
You must be signed in to change notification settings - Fork 328
[V3] v2 -> v3 data migration #1798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Big +1 on this. I'm working on a conversion tool for large-scale genomics data (100s TB scale) which is usually held in file systems (for the moment, it will probably migrate to object stores later on). A CLI tool that does an in-place migration from v2 to v3 would be a big help. I'm hoping to move to v3 early on, before too many datasets are converted into v2 format and so most users won't ever know about v2. My assumptions was that the migration was largely a case of writing a new JSON metadata file per-array, and should be possible to do both cheaply and safely? |
Yes, I think this is right. Besides the metadata, which will live in a completely new JSON document ( |
Thanks yes, I've been aiming for v3 forwards compatibility by using "/" as the default dimension separator. Then, iterating over the chunks in the first dimension and renaming to have a "c" prefix should be relatively cheap (I forgot about this difference). Is there some developer documentation with recommendations for forwards/backwards compatibility? |
For most cases, the migration only requires adding |
This is correct. When I wrote up this issue, I forgot about the |
i updated the issue to be more accurate :) |
I agree that a CLI tool that can convert an entire hierarchy would be great! |
Today I learned that there is a v1 to v2 migrator in the zarr-python codebase: Lines 1941 to 1956 in 6105ef2
|
#2596 was mislabeled as closing this. A migration tool would still be great! |
I am intersted, too. Without the opportunity to migrate Data from v2 to v3, it would be a nightmare to deprecate zarr v2 support. |
This gist is a first-pass effort at a migration script. My team has begun to use it to explore converting our existing zarr datasets, but it is currently limited in scope to zarrs stored on s3. I'd welcome any feedback or discussion about how this could be useful to other devs. |
@eschechter thanks for sharing! So it is enough to adapt the metadata without touching the data itself? |
That's correct @meteoDaniel - the zarr v3 conversion can be done with just the creation of zarr.json metadata files. No need to touch the data. |
We should invest in tools to make the v2 -> v3 conversion simple for people who are motivated to convert their data. A few high-level ideas:
Someone should investigate how complicated in-place conversions would be. On a local filesystem where. V3 is designed to make array conversions easy, requiring only the creation of new metadata.mv
is cheap, this could be attractive.zarr-python
v2 and v3, and a migration guide. This should have its own page in the docs.The text was updated successfully, but these errors were encountered: