Skip to content

Make Record purely position based #1768

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 22, 2025

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented Mar 5, 2025

This aligns the implementation with Java.

We had the keywords there mostly for the tests, but they should not be used, and it seems like that's already the case :'( I was undecided if the costs of this PR (all the changes), are worth it, but I see more PRs using the Record in a bad way (example #1743) that might lead to very subtle bugs where the position might sometime change based on the ordering of the dict.

Blocked by Eventual-Inc/Daft#3917

This aligns the implementation with Java.

We had the keywords there mostly for the tests, but they should
not be used, and it seems like that's already the case :'(
I was undecided if the costs of this PR (all the changes), are worth
it, but I see more PRs using the Record in a bad way (example apache#1743)
that might lead to very subtle bugs where the position might sometime
change based on the ordering of the dict.
@Fokko Fokko changed the title Make records purely position based Make Record purely position based Mar 5, 2025
kevinzwang pushed a commit to Eventual-Inc/Daft that referenced this pull request Mar 7, 2025
Don't use internal fields :)

I want to remove this one in
apache/iceberg-python#1768. It should be okay
since the `Record` also has a `__len__`.
kevinzwang pushed a commit to Eventual-Inc/Daft that referenced this pull request Mar 8, 2025
Don't use internal fields :)

I want to remove this one in
apache/iceberg-python#1768. It should be okay
since the `Record` also has a `__len__`.
@Fokko Fokko marked this pull request as ready for review March 26, 2025 19:15
Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

worth the change! better now than later :P

@@ -792,6 +792,5 @@ def test_partition_key(
snapshot.manifests(iceberg_table.io)[0].fetch_manifest_entry(iceberg_table.io)[0].data_file.file_path
)
# Special characters in partition value are sanitized when written to the data file's partition field
sanitized_record = Record(**{make_compatible_name(k): v for k, v in vars(expected_partition_record).items()})
assert spark_partition_for_justification == sanitized_record
assert spark_partition_for_justification == expected_partition_record
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love this!!

self[idx] = d
@classmethod
def _bind(cls, struct: StructType, **arguments: Any) -> Self:
return cls(*[arguments[field.name] if field.name in arguments else field.initial_default for field in struct.fields])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok initial_default is None if not set 👍

"field_readers",
"create_struct",
"struct",
"_create_with_keyword",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: looks like we got rid of _create_with_keyword

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, saves some memory 👍

@Fokko Fokko merged commit 59742e0 into apache:main Apr 22, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants