Writer Design

This issue propose the writer design to solve:
> arrow: Writing unpartitioned data into iceberg from arrow record batches
> arrow: Writing partitioned data into iceberg from arrow record batches

And the design is based on what we do in icelake and inspire by java iceberg, feel free to any suggestion:

## Class Design

### SpecificFormatWriter

At the bottom level, we have kinds of specific format writer, which responsible for writing record batch into a file of specific format, such as:

```
struct ParquetWriter {
    ...
}

struct AvroWriter {
    ...
}

struct OrcWriter {
    ...
}

/// Implement this trait for above writer
trait SpecificWriter {
    fn write(batch: &RecordBatch) -> Result<()>
}
```

**1. Disscusion: Which format we prepare to support in v0.2. I guess only parquet?**

### DataFileWriter

A higher level of writer is the data writer, data writer use the SpecificWriter and it will split the record batch into multiple file according the config such as `file_size_limit`, it looks like:

```
struct DataFileWriter {
    current_specific_writer: SpecificWriter
}
```

**2. Disscusion: how do we treat the type SpecificWriter, use enum to dispatch or use generic parameter.** 

### ParititionWriter and UnparitionWriter

The top level is PartitionWriter and UnpartitionWriter. For UnpartitionWriter, it is just simlar to the DataFileWriter. For ParitionWriter, it need to split the record batch into different group according partition. And these record batch will be wrote using DataWriter responsible for different partition. It looks like:

```
struct PartitionWriter {
    HashMap<Partition,DataFileWriter>    
} 
```





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Writer Design #34

Class Design

SpecificFormatWriter

DataFileWriter

ParititionWriter and UnparitionWriter

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Writer Design #34

Description

Class Design

SpecificFormatWriter

DataFileWriter

ParititionWriter and UnparitionWriter

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions