-
Notifications
You must be signed in to change notification settings - Fork 291
Update-schema: Add support for initial-default
#1770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update-schema: Add support for initial-default
#1770
Conversation
ed357e5
to
388580a
Compare
@Fokko the PR looks good to me: I think we may just have missed including the new properties in the |
…nitial-default-to-update-schema
…nitial-default-to-update-schema
Right now we deserialize the JSON into a dict, which is then passed into the Pydantic model. It is better to fully delegate this to pydantic because it is probably faster, and we can detect when models are created from json or from Python dicts. Required by apache#1770
# Rationale for this change Right now we deserialize the JSON into a dict, which is then passed into the Pydantic model. It is better to fully delegate this to pydantic because it is probably faster, and we can detect when models are created from json or from Python dicts. Required by #1770 This is also a recommendation by Pydantic itself: https://docs.pydantic.dev/latest/concepts/performance/#in-general-use-model_validate_json-not-model_validatejsonloads # Are these changes tested? Existing tests # Are there any user-facing changes? No <!-- In the case of user-facing changes, please add the changelog label. -->
…nitial-default-to-update-schema
…nitial-default-to-update-schema
@sungwy I've included setting the default value in this PR in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM. I think we also need to think about table version evolution since default value is a V3 feature
What happens when a V2 table adds initial-default
or write-default
?
pyiceberg/conversions.py
Outdated
if len(t) != len(b): | ||
raise ValueError(f"FixedType has length {len(t)}, which is different from the value: {len(b)}") | ||
|
||
return b | ||
return b | ||
else: | ||
return val |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should we also check the len of the bytes type input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good one, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Fokko overall I think the implementation looks great! Just had some comments/questions on tests
results.append((None, DefaultWriter(writer=writer, value=file_field.write_default))) | ||
elif file_field.required: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there tests for the round trip writing/reading of default values? Or are we doing that separately, and we're just focusing on the schema update changes in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's this PR #1644 but I thought splitting out the schema-update changes in a separate PR might make it easier to review 👍
…nitial-default-to-update-schema
did you push the new commits? @Fokko |
@kevinjqliu I did commit, but didn't push them 😀 Thanks for the reminder! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work @Fokko!
Rationale for this change
This allows for V3 initial defaults.
This PR took a bit longer than anticipated, mostly because the Pydantic json deserialization. There is a certain way we need to serialize python types to JSON single value encoding.
Are these changes tested?
Added new tests
Are there any user-facing changes?
After this PRs initial defaults can be set through the API. This enables users to add required fields.