Skip to content

Format-versioned Snapshots in light of V3 additions #1973

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
smaheshwar-pltr opened this issue May 6, 2025 · 4 comments
Open

Format-versioned Snapshots in light of V3 additions #1973

smaheshwar-pltr opened this issue May 6, 2025 · 4 comments

Comments

@smaheshwar-pltr
Copy link
Contributor

smaheshwar-pltr commented May 6, 2025

Feature Request / Improvement

While thinking about #1971 and #1972, I realised that V3 introduces new fields to Snapshot - one required for V3 and the other not.

As it stands, it feels inelegant to add the V3 required field as an optional field on the Snapshot class and e.g. check within TableMetadata construction that it's present if the table is V3 (or just not do this at all). I think it might be nicer to encode that information within the typing (model), similar to the TableMetadataV3 excerpt below.

row_lineage: bool = Field(alias="row-lineage", default=False)
"""Indicates that row-lineage is enabled on the table
For more information:
https://iceberg.apache.org/spec/?column-projection#row-lineage
"""
next_row_id: Optional[int] = Field(alias="next-row-id", default=None)
"""A long higher than all assigned row IDs; the next snapshot's `first-row-id`."""

I'm therefore wondering about about "versioning" Snapshot similar to TableMetadata, so that V3 TableMetadata would contain a list of V3 Snapshots. Then, if V3 snapshot fields are present in V2 metadata, we'd get the benefit of throwing which I think is nice about PyIceberg's TableMetadata Union setup here compared to other implementations.

(I've not fleshed out the details here so not certain this is feasible but dropping an issue for now. Perhaps this has already been discussed / thought about 😄)

@Fokko
Copy link
Contributor

Fokko commented May 6, 2025

Hey @smaheshwar-pltr Thanks for bringing this up.

I'm therefore wondering about "versioning" Snapshot similar to TableMetadata, so that V3 TableMetadata would contain a list of V3 Snapshots.

The problem is that from the moment we upgrade a table from {V1,V2} to V3, the field is not there, so we still would run into deserialization issues. For simplicity, I'm leaning towards not versioning because we still would need to check if the fields are not-null, as they stay null after bumping the version to V3: https://iceberg.apache.org/spec/#row-lineage-for-upgraded-tables

@smaheshwar-pltr
Copy link
Contributor Author

Ooh thanks a lot for pointing that out @Fokko, I think the upgrade procedure would indeed make versioning complicated. Siding with you now

@Fokko
Copy link
Contributor

Fokko commented May 8, 2025

@smaheshwar-pltr Are you interested in adding those fields?

@smaheshwar-pltr
Copy link
Contributor Author

@smaheshwar-pltr Are you interested in adding those fields?

Happy for someone else to take a stab!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants