-
Notifications
You must be signed in to change notification settings - Fork 300
Support snapshot management operations like creating tags by adding ManageSnapshots
API
#728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @chinmay-bhat - thank you very much for working on this PR. I think this will be very useful to have in PyIceberg. I left a few suggestions about your proposed implementation in the Transaction class.
In addition, I think it would be a good idea to add an API in the Table
class as well that wraps this transaction API, and add some unit tests for both APIs
@syun64 thank you for the review! Do I still need to create unit tests for |
Yes - even if its small, I think it would still be good to have a unit test that verifies the behavior of the proposed table and transaction API There are some tests in https://github.com/apache/iceberg-python/blob/main/tests/table/test_init.py#L592 that should serve as good examples of API unit tests |
I've added the tests based on your suggestions, please review again whenever possible :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! Thank you for adding the tests @chinmay-bhat 💯
Can anyone with write access review? |
Thanks @chinmay-bhat for taking this and @syun64 for reviewing! Linking my comment to here, I actually prefer to make this an internal method behind other APIs. However, given that
|
Hi @HonahX , I've updated the PR based on your suggestions. I've made While testing, I realised that we(PyIceberg) only support the main branch, ie. while using Am I missing something here, or do we need to create a PR to get the snapshot_log, current_snapshot and other details for every branch/tag? That said, since this PR holds the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update!
I'm assuming this is because we always display the main branch snapshot_log.
Yes you're right. According to the spec, the current-snapshot-id
must be the same as the snapshot id of "main" branch. The snapshot-log
should store changes to the current-snapshot
of the table. Therefore, we do not need to handle these if we're not updating the "main" branch.
We will need some additional work to make pyiceberg fully support branch and tag, especially writing to other branches.
I've left some comments. Please let me know what you think. Thanks for the contribution!
Thank you Honah for the review! :) I've updated the tests! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late review, this looks like a great start @chinmay-bhat 🙌
Hi @chinmay-bhat this looks almost ready to merge - I responded to the thread above to continue the discussion on the TableRequirements. In the mean time, could we add a new section to the mkdocs/api.md file named "Snapshot Management" and port over the examples you already have in your docstrings so we can have it on the official docsite? |
And let's update the name of this PR as well to reflect its current state |
set_ref_snapshot
APIManageSnapshots
API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good @chinmay-bhat 🙌 and I agree with @syun64 that we're super close 👍
added sql statements from provision.py rename table
602eef3
to
9f068a6
Compare
Removed AssertTableUUID from PR and rebased onto latest main! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chinmay-bhat Thanks for working on this and resolving all review comments! LGTM! It is a great start for snapshot management support.
Merged! Thanks @chinmay-bhat for the great work. Thanks @Fokko @syun64 for the review. |
This is awesome! @chinmay-bhat You are on a roll with these PRs right now, thank you for contributing these much needed features to PyIceberg! Looking forward to releasing these with the upcoming 0.7.0 release |
Creates the public facing
ManageSnapshots
API that currently includescreate_tag
andcreate_branch
. More operations to implement can be found in this issue - #737Closes #573