Description
Feature Request / Improvement
Let's investigate the level of abstraction on the write path.
Currently, we are doing schema-compatible checks, schema coercion, bin-packing, transformation, etc at different levels of the stack. It'll be good to optimize and see which functions can be pushed up the stack.
For example, here's what the overwrite
path looks like
overwrite
_dataframe_to_data_files
write_file
write_parquet
(copied over from #910 (review))
Another example #786 (comment)
More info
overwrite
checks schema compatibility
iceberg-python/pyiceberg/table/__init__.py
Lines 541 to 550 in 3f44dfe
_dataframe_to_data_files
bin-packs the pyarrow Table
iceberg-python/pyiceberg/io/pyarrow.py
Lines 2222 to 2225 in 3f44dfe
write_parquet
transforms table schema
iceberg-python/pyiceberg/io/pyarrow.py
Lines 2001 to 2008 in 3f44dfe
and
iceberg-python/pyiceberg/io/pyarrow.py
Lines 2011 to 2021 in 3f44dfe