Skip to content

Conversation

ZENOTME
Copy link
Contributor

@ZENOTME ZENOTME commented Mar 5, 2025

Which issue does this PR close?

This PR is part 1 to close #342.

What changes are included in this PR?

The partition writer support will be separate into three PR:

  1. add partition splitter which used to split the record batch based on partition value
  2. add fanout partition writer which will compute the partition value of input and split them
  3. add precompute partition writer which will use the partition value provided by user

This PR is the first part.

Are these changes tested?

@ZENOTME
Copy link
Contributor Author

ZENOTME commented Mar 5, 2025

This PR may conflict with #1014. But I'm not sure whether #1014 can be reviewed and merge recently. I'm ok to merge any one first and I will fix the conflict later. cc @liurenjie1024 @jonathanc-n @Fokko @Xuanwo @sdd

@ZENOTME ZENOTME force-pushed the pw_1_partition_spliter branch from 4ac8823 to 7bc64bb Compare March 5, 2025 13:30
@jonathanc-n
Copy link
Contributor

jonathanc-n commented Mar 17, 2025

@ZENOTME I should be able to take a look at this tomorrow

Copy link
Contributor

@jonathanc-n jonathanc-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nits, I don't know if worth to even change if this is just temporary code.

// # TODO
// Remove this after partition writer supported.
#[allow(dead_code)]
pub struct RecordBatchPartitionsplitter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub struct RecordBatchPartitionsplitter {
pub struct RecordBatchPartitionSplitter {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and rest of code

partition_batches.push((
row,
filter_record_batch(batch, &filter_array)
.expect("We should guarantee the filter array is valid"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer to propogate error instead of expect

@ZENOTME ZENOTME force-pushed the pw_1_partition_spliter branch from 7bc64bb to f60e82d Compare April 30, 2025 01:03
@ZENOTME ZENOTME mentioned this pull request Apr 30, 2025
2 tasks
@ZENOTME
Copy link
Contributor Author

ZENOTME commented May 1, 2025

This PR is ready to review. cc @liurenjie1024 @Fokko @Xuanwo @sdd

@ZENOTME ZENOTME requested a review from jonathanc-n May 1, 2025 12:43
@ranjanankur314
Copy link

Hi @jonathanc-n & @ZENOTME
Could we please enable partition support in Iceberg Rust? Can we please expedite this if possible?

@jonathanc-n
Copy link
Contributor

@ranjanankur314 I will try my best to give it another review today. At the end of the day we will need a committer's review

@ZENOTME
Copy link
Contributor Author

ZENOTME commented Aug 25, 2025

Hi, I think this PR is ready to review. cc @liurenjie1024 @Xuanwo @Fokko @kevinjqliu @sdd

Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ZENOTME for this pr!

}

/// Split the record batch into multiple record batches according to provided partition columns.
pub fn split_by_partition(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this have to be pub?

&self,
batch: &RecordBatch,
partition_columns: &[ArrayRef],
) -> Result<Vec<(OwnedRow, RecordBatch)>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use Struct to replace OwnedRow? I assume the partitioned writer needs Struct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I noticed that out Struct support Hash, so we don't need OwnedRow. I have remove them and use Struct directlty.

}

/// Convert row back to iceberg value.
pub fn convert_row(&self, rows: Vec<OwnedRow>) -> Result<Vec<Struct>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this function? Why not just convert partition value columnar batch into Vec<Struct>

@ZENOTME ZENOTME force-pushed the pw_1_partition_spliter branch from a482ff3 to c0d5a12 Compare August 28, 2025 17:18
Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ZENOTME for this pr! Generally LGTM, just one nit.

Comment on lines 95 to 96
let partition_type = self.partition_spec.partition_type(&self.schema)?;
let partition_arrow_type = type_to_arrow_type(&Type::Struct(partition_type.clone()))?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do this conversion for each batch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch

@ZENOTME ZENOTME force-pushed the pw_1_partition_spliter branch from eabc8d0 to 8a62fe2 Compare September 2, 2025 15:17
Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ZENOTME for this pr!

@liurenjie1024 liurenjie1024 merged commit bd228fe into apache:main Sep 3, 2025
17 checks passed
@ZENOTME ZENOTME deleted the pw_1_partition_spliter branch September 3, 2025 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement partition writer
4 participants