Skip to content

Format-versioned Snapshots in light of V3 additions #1973

Open
@smaheshwar-pltr

Description

@smaheshwar-pltr

Feature Request / Improvement

While thinking about #1971 and #1972, I realised that V3 introduces new fields to Snapshot - one required for V3 and the other not.

As it stands, it feels inelegant to add the V3 required field as an optional field on the Snapshot class and e.g. check within TableMetadata construction that it's present if the table is V3 (or just not do this at all). I think it might be nicer to encode that information within the typing (model), similar to the TableMetadataV3 excerpt below.

row_lineage: bool = Field(alias="row-lineage", default=False)
"""Indicates that row-lineage is enabled on the table
For more information:
https://iceberg.apache.org/spec/?column-projection#row-lineage
"""
next_row_id: Optional[int] = Field(alias="next-row-id", default=None)
"""A long higher than all assigned row IDs; the next snapshot's `first-row-id`."""

I'm therefore wondering about about "versioning" Snapshot similar to TableMetadata, so that V3 TableMetadata would contain a list of V3 Snapshots. Then, if V3 snapshot fields are present in V2 metadata, we'd get the benefit of throwing which I think is nice about PyIceberg's TableMetadata Union setup here compared to other implementations.

(I've not fleshed out the details here so not certain this is feasible but dropping an issue for now. Perhaps this has already been discussed / thought about 😄)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions