Open
Description
This is related to #783.
Namely what happens is
- I use
pyiceberg
to create an Iceberg table from a Parquet file. - The Parquet file has type hints for e.g.
DataType::Int16
(required int32 c1 (INTEGER(16,true)) = 1;
). - Thanks to Discussion: Support conversion of Arrow
Int8
andInt16
toPrimitiveType::Int
#783 we now upcast that to the native 32-bit Int type and can read it. - This is also the type returned in e.g.
TableProvider::schema
. - However the actual type in the read arrow record batches (inferred from the Parquet hint) is now
DataType::Int16
, leading to reported and actual schema mismatch. - This now leads to a DataFusion query such as
SELECT c1 FROM t where c1 <= 2
crashing withInvalid comparison operation: Int16 <= Int32
- Ultimately the schema mismatch tricks one of the logical optimizers into thinking that if it casts the right side (i.e. the
2
literal) intoDataType::Int32
(from the reported schema) the comparison will be fine.
Metadata
Metadata
Assignees
Labels
No labels