-
Notifications
You must be signed in to change notification settings - Fork 357
Closed
Description
Apache Iceberg version
0.6.0 (latest release)
Please describe the bug 🐞
schema_to_pyarrow
converts BinaryType to pa.large_binary()
type. This creates inconsistencies with the arrow table schema produced from the data scan between:
- when schema_to_pyarrow is used when there is no data in the table (pa.large_binary())
- when we use the physical_schema of the file fragment to read the table (pa.binary())
Related PR: #409
The implication of this bug is that pa.Table read from the same Iceberg Table may yield different schema based on whether or not there is data within the defined table scan.
More importantly, it also means that if one of the files is empty, and another file has data within the same table scan, then the schema inconsistencies in the two arrow tables will result in an error as we attempt to pa.concat_tables(tables)
Metadata
Metadata
Assignees
Labels
No labels