-
Notifications
You must be signed in to change notification settings - Fork 289
Make Record
purely position based
#1768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This aligns the implementation with Java. We had the keywords there mostly for the tests, but they should not be used, and it seems like that's already the case :'( I was undecided if the costs of this PR (all the changes), are worth it, but I see more PRs using the Record in a bad way (example apache#1743) that might lead to very subtle bugs where the position might sometime change based on the ordering of the dict.
Record
purely position based
Don't use internal fields :) I want to remove this one in apache/iceberg-python#1768. It should be okay since the `Record` also has a `__len__`.
Don't use internal fields :) I want to remove this one in apache/iceberg-python#1768. It should be okay since the `Record` also has a `__len__`.
…e-keywords-from-record
…e-keywords-from-record
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
worth the change! better now than later :P
@@ -792,6 +792,5 @@ def test_partition_key( | |||
snapshot.manifests(iceberg_table.io)[0].fetch_manifest_entry(iceberg_table.io)[0].data_file.file_path | |||
) | |||
# Special characters in partition value are sanitized when written to the data file's partition field | |||
sanitized_record = Record(**{make_compatible_name(k): v for k, v in vars(expected_partition_record).items()}) | |||
assert spark_partition_for_justification == sanitized_record | |||
assert spark_partition_for_justification == expected_partition_record |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
love this!!
self[idx] = d | ||
@classmethod | ||
def _bind(cls, struct: StructType, **arguments: Any) -> Self: | ||
return cls(*[arguments[field.name] if field.name in arguments else field.initial_default for field in struct.fields]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok initial_default
is None
if not set 👍
pyiceberg/avro/reader.py
Outdated
"field_readers", | ||
"create_struct", | ||
"struct", | ||
"_create_with_keyword", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: looks like we got rid of _create_with_keyword
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, saves some memory 👍
This aligns the implementation with Java.
We had the keywords there mostly for the tests, but they should not be used, and it seems like that's already the case :'( I was undecided if the costs of this PR (all the changes), are worth it, but I see more PRs using the Record in a bad way (example #1743) that might lead to very subtle bugs where the position might sometime change based on the ordering of the dict.
Blocked by Eventual-Inc/Daft#3917