-
Notifications
You must be signed in to change notification settings - Fork 299
Description
Name mapping is used when the files in the table don't have field-IDs encoded in the Parquet files. For example, when adding files through add_files
in the case of a table migration from Hive, the Parquet files don't have field-IDs in them. In this case we want to make use of name-mapping: https://iceberg.apache.org/spec/#name-mapping-serialization This is a JSON blob that's stored alongside the table in a table property.
This issue is solely on the deserialization of the JSON blob into a memory structure. Tests can be found here: https://github.com/apache/iceberg-python/blob/main/tests/table/test_name_mapping.py
Future tip: It is best to store this in a recursive field so it can be traversed using a VisitorWithParent
where both a Schema
and NameMapping
can be traversed at once. This is important because we cannot flatten the name-mapping because of potential dots in the field name, and this disallows us to split between fields and subfields. This is done in PyIceberg here: apache/iceberg-python#1014