Make `Record` purely position based #1768

Fokko · 2025-03-05T21:20:01Z

This aligns the implementation with Java.

We had the keywords there mostly for the tests, but they should not be used, and it seems like that's already the case :'( I was undecided if the costs of this PR (all the changes), are worth it, but I see more PRs using the Record in a bad way (example #1743) that might lead to very subtle bugs where the position might sometime change based on the ordering of the dict.

Blocked by Eventual-Inc/Daft#3917

This aligns the implementation with Java. We had the keywords there mostly for the tests, but they should not be used, and it seems like that's already the case :'( I was undecided if the costs of this PR (all the changes), are worth it, but I see more PRs using the Record in a bad way (example apache#1743) that might lead to very subtle bugs where the position might sometime change based on the ordering of the dict.

Don't use internal fields :) I want to remove this one in apache/iceberg-python#1768. It should be okay since the `Record` also has a `__len__`.

…e-keywords-from-record

kevinjqliu

LGTM!

worth the change! better now than later :P

tests/avro/test_file.py

kevinjqliu · 2025-04-19T18:49:52Z

tests/integration/test_partitioning_key.py

@@ -792,6 +792,5 @@ def test_partition_key(
            snapshot.manifests(iceberg_table.io)[0].fetch_manifest_entry(iceberg_table.io)[0].data_file.file_path
        )
        # Special characters in partition value are sanitized when written to the data file's partition field
-        sanitized_record = Record(**{make_compatible_name(k): v for k, v in vars(expected_partition_record).items()})
-        assert spark_partition_for_justification == sanitized_record
+        assert spark_partition_for_justification == expected_partition_record


love this!!

kevinjqliu · 2025-04-19T18:58:25Z

pyiceberg/typedef.py

-            self[idx] = d
+    @classmethod
+    def _bind(cls, struct: StructType, **arguments: Any) -> Self:
+        return cls(*[arguments[field.name] if field.name in arguments else field.initial_default for field in struct.fields])


ah ok initial_default is None if not set 👍

kevinjqliu · 2025-04-19T19:01:39Z

pyiceberg/avro/reader.py

+        "field_readers",
+        "create_struct",
+        "struct",
+        "_create_with_keyword",


nit: looks like we got rid of _create_with_keyword

Nice catch, saves some memory 👍

Fokko mentioned this pull request Mar 5, 2025

fix: Don't use _position_to_field_name Eventual-Inc/Daft#3917

Merged

Fokko changed the title ~~Make records purely position based~~ Make Record purely position based Mar 5, 2025

kevinzwang pushed a commit to Eventual-Inc/Daft that referenced this pull request Mar 7, 2025

fix: Don't use _position_to_field_name (#3917)

a1e5ff3

Don't use internal fields :) I want to remove this one in apache/iceberg-python#1768. It should be okay since the `Record` also has a `__len__`.

kevinzwang pushed a commit to Eventual-Inc/Daft that referenced this pull request Mar 8, 2025

fix: Don't use _position_to_field_name (#3917)

ef6b981

Don't use internal fields :) I want to remove this one in apache/iceberg-python#1768. It should be okay since the `Record` also has a `__len__`.

Fokko added 4 commits March 24, 2025 21:21

Merge branch 'main' of github.com:apache/iceberg-python into fd-remov…

a4d4bf6

…e-keywords-from-record

🤔

f13bf9e

Merge branch 'main' of github.com:apache/iceberg-python into fd-remov…

0e69fed

…e-keywords-from-record

fix some tests

6cb546a

Fokko marked this pull request as ready for review March 26, 2025 19:15

Cleanup

a862c0b

kevinjqliu approved these changes Apr 19, 2025

View reviewed changes

Thanks Kevin!

bf66db5

Fokko merged commit 59742e0 into apache:main Apr 22, 2025
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `Record` purely position based #1768

Make `Record` purely position based #1768

Fokko commented Mar 5, 2025 •

edited

Loading

kevinjqliu left a comment

kevinjqliu Apr 19, 2025

kevinjqliu Apr 19, 2025

kevinjqliu Apr 19, 2025

Fokko Apr 22, 2025

Make Record purely position based #1768

Make Record purely position based #1768

Conversation

Fokko commented Mar 5, 2025 • edited Loading

kevinjqliu left a comment

Choose a reason for hiding this comment

kevinjqliu Apr 19, 2025

Choose a reason for hiding this comment

kevinjqliu Apr 19, 2025

Choose a reason for hiding this comment

kevinjqliu Apr 19, 2025

Choose a reason for hiding this comment

Fokko Apr 22, 2025

Choose a reason for hiding this comment

Make `Record` purely position based #1768

Make `Record` purely position based #1768

Fokko commented Mar 5, 2025 •

edited

Loading