-
Notifications
You must be signed in to change notification settings - Fork 300
Open
Description
...out of curiosity, I took a closer look at the pyiceberg impl and how the Table.append()
works.
Now, I would like to pick your brain, in order to understand and track the next steps we have to take to support append
as well (since we should be getting close to having write support). The goal here is, to extract and create actionable issues.
Here is what I understand from the python impl so far (high-level):
- we call
append()
on the Table class with our DataFrame: pa.Table and the snaphot_properties: Dict[str, str] - we create a
Transaction
that basically does two things:
2.1. It creates a_MergingSnapshotProducer
which is (on a high-level) responsible for writing a new ManifestList, creating a new Snapshot (returned as AddSnaphotUpdate)
2.2 It callsupdate_table
on the respective Catalog which creates a new metadata.json and returns the new metadata as well as the new metadata_location
Here is what I think we need to implement (rough sketch):
- impl
fn append(...)
onstruct Table
:
This should probably accept a RecordBatch as a param, create a newTransaction
, and delegates further action to the transaction. - impl
fn append(...)
onstruct Transaction
:
Receives RecordBatch and snapshot_properties. Performs validation checks. Converts the RecordBatch to a collection ofDataFiles
and creates a_MergingSnapshotProducer
with the collection. - impl
_MergingSnapshotProducer
:
:: write manifests (added, deleted, existing)
:: get next_sequence_number fromTableMetadata
:: update snapshot summaries
:: generate manifest_list_path
:: write manifest_list
:: create a new Snapshot
:: return TableUpdate: AddSnapshot - impl
update_table
on the concrete Catalog implementations
What could be possible Issues here?
I think we need to start with the _MergingSnapshotProducer
(possibly split into mutliple parts) and work our way up the list?
Once we have the MergingSnapshotProducer, we can implement the append function on Transaction which basically orchestrates?
sdd, ZENOTME, Xuanwo, liurenjie1024 and kwannoel
Metadata
Metadata
Assignees
Labels
No labels