Skip to content

Update-schema: Add support for initial-default #1770

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Apr 24, 2025

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented Mar 6, 2025

Rationale for this change

This allows for V3 initial defaults.

This PR took a bit longer than anticipated, mostly because the Pydantic json deserialization. There is a certain way we need to serialize python types to JSON single value encoding.

Are these changes tested?

Added new tests

Are there any user-facing changes?

After this PRs initial defaults can be set through the API. This enables users to add required fields.

@Fokko Fokko force-pushed the fd-add-initial-default-to-update-schema branch from ed357e5 to 388580a Compare March 6, 2025 11:26
@sungwy
Copy link
Collaborator

sungwy commented Mar 15, 2025

@Fokko the PR looks good to me: I think we may just have missed including the new properties in the rename_column method. I agree that we could introduce the ability to update write_default in a different PR

@Fokko Fokko added the changelog Indicates that the PR introduces changes that require an entry in the changelog. label Mar 17, 2025
@Fokko Fokko marked this pull request as draft March 17, 2025 13:31
Fokko added a commit to Fokko/iceberg-python that referenced this pull request Mar 25, 2025
Right now we deserialize the JSON into a dict, which is then passed
into the Pydantic model. It is better to fully delegate this to
pydantic because it is probably faster, and we can detect when
models are created from json or from Python dicts.

Required by apache#1770
Fokko added a commit that referenced this pull request Mar 25, 2025
# Rationale for this change

Right now we deserialize the JSON into a dict, which is then passed into
the Pydantic model. It is better to fully delegate this to pydantic
because it is probably faster, and we can detect when models are created
from json or from Python dicts.

Required by #1770

This is also a recommendation by Pydantic itself:
https://docs.pydantic.dev/latest/concepts/performance/#in-general-use-model_validate_json-not-model_validatejsonloads

# Are these changes tested?

Existing tests

# Are there any user-facing changes?

No

<!-- In the case of user-facing changes, please add the changelog label.
-->
@Fokko Fokko marked this pull request as ready for review March 26, 2025 14:23
@kevinjqliu kevinjqliu added the V3 label Mar 26, 2025
@Fokko Fokko requested a review from sungwy March 26, 2025 18:40
@Fokko
Copy link
Contributor Author

Fokko commented Apr 17, 2025

@sungwy I've included setting the default value in this PR in set_default_value. PTAL :)

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM. I think we also need to think about table version evolution since default value is a V3 feature

What happens when a V2 table adds initial-default or write-default?

Comment on lines 568 to 573
if len(t) != len(b):
raise ValueError(f"FixedType has length {len(t)}, which is different from the value: {len(b)}")

return b
return b
else:
return val
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we also check the len of the bytes type input

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good one, thanks!

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fokko overall I think the implementation looks great! Just had some comments/questions on tests

Comment on lines +293 to 294
results.append((None, DefaultWriter(writer=writer, value=file_field.write_default)))
elif file_field.required:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there tests for the round trip writing/reading of default values? Or are we doing that separately, and we're just focusing on the schema update changes in this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's this PR #1644 but I thought splitting out the schema-update changes in a separate PR might make it easier to review 👍

@kevinjqliu
Copy link
Contributor

did you push the new commits? @Fokko

@Fokko
Copy link
Contributor Author

Fokko commented Apr 23, 2025

@kevinjqliu I did commit, but didn't push them 😀 Thanks for the reminder!

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work @Fokko!

@Fokko Fokko merged commit 237333d into apache:main Apr 24, 2025
7 checks passed
@Fokko Fokko deleted the fd-add-initial-default-to-update-schema branch April 24, 2025 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog Indicates that the PR introduces changes that require an entry in the changelog. V3
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants