Skip to content

Support getting snapshot at or right before the given timestamp #748

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Jun 3, 2024

Conversation

chinmay-bhat
Copy link
Contributor

@chinmay-bhat chinmay-bhat commented May 16, 2024

Bring support to retrieve a snapshot before a particular timestamp, which is needed to perform Spark procedure like rollback_to_timestamp.

See comment in issue

@ndrluis
Copy link
Collaborator

ndrluis commented May 16, 2024

Hello @chinmay-bhat,

I noticed that you are implementing the ancestors_of method, and we have another pull request (#533) that is implementing the same behavior in another place as a function with a different output (Iterable[Snapshot] instead of a List[tuple[int, int]]) and signature (expects a Snapshot instead of a Snapshot ID).

I believe that we need to discuss and choose which one we want to have in the codebase.

cc/ @HonahX @Fokko @syun64

@sungwy
Copy link
Collaborator

sungwy commented May 16, 2024

Hi @ndrluis thank you for flagging this! That PR went under my radar, and I'm excited to see a incremental scanning feature being implemented already on PyIceberg.

As for the question on the output type, I'm +1 for using Iterable[Snapshot] because I have a preference for using a class with set attributes than using a tuple.

Im also +1 for introducing the feature in this separate PR, since it's a much simpler feature in itself we can introduce quickly. WDYT?

@chinmay-bhat
Copy link
Contributor Author

Happy to update the output type to Iterable[Snapshot]! Also I really like how concise the ancestors_of function is in the other PR.

Copy link
Contributor

@HonahX HonahX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chinmay-bhat Thanks for working on this! I left some comments. Love to see more snapshot utils being added.

@chinmay-bhat chinmay-bhat changed the title Support getting a snapshot right before the given timestamp Support getting snapshot at or right before the given timestamp May 31, 2024
Copy link
Contributor

@HonahX HonahX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating! I think this is good to go when the lint issue is fixed.

@chinmay-bhat chinmay-bhat force-pushed the latest_snapshot_before_timestamp branch from b6b24d7 to 287b386 Compare June 3, 2024 16:21
Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great @chinmay-bhat and thanks for the review @HonahX 🙌

@chinmay-bhat
Copy link
Contributor Author

chinmay-bhat commented Jun 3, 2024

Thank you for the review @Fokko, @HonahX, @syun64 and @ndrluis ! 🚀

@HonahX HonahX merged commit 18448fd into apache:main Jun 3, 2024
7 checks passed
@HonahX
Copy link
Contributor

HonahX commented Jun 3, 2024

Merged! Thanks @chinmay-bhat for the great work! Thanks @Fokko @syun64 @ndrluis for the review and discussions!

@chinmay-bhat chinmay-bhat deleted the latest_snapshot_before_timestamp branch June 3, 2024 16:43
@sungwy
Copy link
Collaborator

sungwy commented Jun 9, 2024

Congrats on your first PR @chinmay-bhat !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants