Upcasting and Downcasting inconsistencies with PyArrow Schema

### Apache Iceberg version

0.6.0 (latest release)

### Please describe the bug 🐞

`schema_to_pyarrow` converts BinaryType to `pa.large_binary()` type. This creates inconsistencies with the arrow table schema produced from the data scan between:
1. when schema_to_pyarrow is used when there is no data in the table (pa.large_binary())
2. when we use the physical_schema of the file fragment to read the table (pa.binary())

Related PR: https://github.com/apache/iceberg-python/pull/409 

The implication of this bug is that pa.Table read from the same Iceberg Table may yield different schema based on whether or not there is data within the defined table scan.

More importantly, it also means that if one of the files is empty, and another file has data within the same table scan, then the schema inconsistencies in the two arrow tables will result in an error as we attempt to `pa.concat_tables(tables)`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Upcasting and Downcasting inconsistencies with PyArrow Schema #791

Apache Iceberg version

Please describe the bug 🐞

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Upcasting and Downcasting inconsistencies with PyArrow Schema #791

Description

Apache Iceberg version

Please describe the bug 🐞

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions