Core: Group binpack fileGroup by output partitionSpec #14281

dyno · 2025-10-08T22:28:49Z

Iceberg 1.8+ introduces support in rewrite_data_files for writing to an output spec that differs from the table’s current partition spec.

In our use case, the current table is partitioned by event/date/hour/batchId, where batchId serves as a work unit identifier. To address the small files problem, we want to roll up mature data to a coarser partition layout (event/date). This feature enables an efficient in-table rollup.

We observed two issues:

Files are grouped by the current table’s partition spec, which places each small file into its own group — effectively preventing any bin-packing.
When running rewrite_data_files again, because the files no longer conform to the table’s current output partition spec, Spark places all files into a single group, causing a large and unnecessary shuffle.

This patch resolves the issue by grouping files according to the output partition spec instead. As an additional benefit, when the output spec is only partially compatible with the current one, the grouping logic will still align on shared partitions — for example, when rolling up from event/date/hour to event/server/date, it will group data primarily by event/date.

dyno · 2025-10-09T01:45:33Z

core/src/main/java/org/apache/iceberg/util/PartitionUtil.java

+            });
+  }
+
+  private static Object convertPartitionValue(


it's not exhaustive, only the common year/month/day/hour case.

Can we have an exhaustive implementation?
Adding something half finished seems problematic to me.

dyno · 2025-10-09T01:46:55Z

core/src/main/java/org/apache/iceberg/actions/BinPackRewriteFilePlanner.java

        groups,
        group ->
-            enoughInputFiles(group)
+            (group.size() >= 1


mostly for single file case.

Are we rewriting a single file just so we can change the specId?

dyno · 2025-10-09T22:46:19Z

@pvary @Parth-Brahmbhatt could you help to review it? thanks.

pvary · 2025-10-10T05:50:09Z

This patch resolves the issue by grouping files according to the output partition spec instead.

What happens if the file contains data for multiple new partitions? Say the new output spec is more detailed than the original one, or maybe even contain absolutely different partitions?

I didn't have time to go through your code, but there are existing tools to get the partition value for a row with a given spec. You might want to use them.

Also please fix the test issues.

dyno · 2025-10-10T16:23:15Z

This patch resolves the issue by grouping files according to the output partition spec instead.

What happens if the file contains data for multiple new partitions? Say the new output spec is more detailed than the original one, or maybe even contain absolutely different partitions?

let's say we have event/date partition,
if the output spec is more detailed, e.g. event/date/hour then the function will group by event/date/null it is still more performant than group everything together then repartition. event/team/date/hour will be event/null/date/null, still better.
if the output spec is totally different, e.g. team then the extracted partition key will be team=null effectively emptyPartition one file group, it's the current behavior.

I didn't have time to go through your code, but there are existing tools to get the partition value for a row with a given spec. You might want to use them.

i think you mean coercePartition(), it does not deal with the hidden partition correctly e.g. extract date from hour, and need to calculate the extraction from spec every time. let me see if i can just use coercePartition.

Also please fix the test issues.

will do.

Iceberg 1.8+ introduces support in `rewrite_data_files` for writing to an output spec that differs from the table’s current partition spec. In our use case, the current table is partitioned by event/date/hour/batchId, where `batchId` serves as a work unit identifier. To address the small files problem, we want to roll up mature data to a coarser partition layout (event/date). This feature enables an efficient in-table rollup. We observed two issues: 1. Files are grouped by the current table’s partition spec, which places each small file into its own group — effectively preventing any bin-packing. 2. When running `rewrite_data_files` again, because the files no longer conform to the table’s current output partition spec, Spark places all files into a single group, causing a large and unnecessary shuffle. This patch resolves the issue by grouping files according to the *output* partition spec instead. As an additional benefit, when the output spec is only partially compatible with the current one, the grouping logic will still align on shared partitions — for example, when rolling up from event/date/hour to event/server/date, it will group data primarily by event/date.

pvary · 2025-10-13T08:54:57Z

core/src/main/java/org/apache/iceberg/actions/BinPackRewriteFilePlanner.java

        tasks,
        task ->
-            outsideDesiredFileSizeRange(task) || tooManyDeletes(task) || tooHighDeleteRatio(task));
+            (task.file() != null && task.file().specId() != outputSpecId())


Are we rewriting a single file just so we can change the specId?

it's more to make the storage layout match the spec, e.g. in our case we want to rewrite the file from
s3://bucket/datalake/event=xxx/team=xxx/batchid=xxx/date=xxx/hour=xxx/xxx.parquet (current spec)
to
s3://bucket/datalake/event=xxx/team=xxx/date=xxx/xxx.parquet (rollup spec)

given that a lot of policy is based on s3 prefix, it's better to rewrite even there is only one file.

Could we move this to a different PR?
I think this one is a bit questionable, and other community members might have different options around this.

pvary · 2025-10-13T09:17:31Z

I didn't have time to go through your code, but there are existing tools to get the partition value for a row with a given spec. You might want to use them.

i think you mean coercePartition(), it does not deal with the hidden partition correctly e.g. extract date from hour, and need to calculate the extraction from spec every time. let me see if i can just use coercePartition.

What does does not deal with the hidden partition correctly mean?

dyno · 2025-10-13T16:00:45Z

I didn't have time to go through your code, but there are existing tools to get the partition value for a row with a given spec. You might want to use them.

i think you mean coercePartition(), it does not deal with the hidden partition correctly e.g. extract date from hour, and need to calculate the extraction from spec every time. let me see if i can just use coercePartition.

What does does not deal with the hidden partition correctly mean?

 private StructProjection(StructType structType, StructType projection, boolean allowMissing) {
...
if (projectedField.fieldId() == dataField.fieldId()) {

in the hidden partition case, date(ts) and hour(ts) are 2 different field and has different fieldId, then it will be null. it's better to compare the sourceId() and see they are "compatible" (satisfilesOrderOf?) but without spec, the information is lost.

pvary · 2025-10-14T11:24:03Z

in the hidden partition case, date(ts) and hour(ts) are 2 different field and has different fieldId, then it will be null. it's better to compare the sourceId() and see they are "compatible" (satisfilesOrderOf?) but without spec, the information is lost.

Thanks for the explanation. This makes sense.

github-actions bot added spark core labels Oct 8, 2025

dyno marked this pull request as draft October 8, 2025 22:29

dyno force-pushed the 1.10.x branch from c2eb9ea to ee72e83 Compare October 9, 2025 01:41

dyno marked this pull request as ready for review October 9, 2025 01:43

dyno commented Oct 9, 2025

View reviewed changes

dyno force-pushed the 1.10.x branch from ee72e83 to abe6a19 Compare October 10, 2025 16:28

pvary reviewed Oct 13, 2025

View reviewed changes

dyno changed the title ~~Core: Group binpack fileGroup by output partitonSpec~~ Core: Group binpack fileGroup by output partitionSpec Oct 15, 2025

Core: Group binpack fileGroup by output partitionSpec #14281

Are you sure you want to change the base?

Core: Group binpack fileGroup by output partitionSpec #14281

Uh oh!

Conversation

dyno commented Oct 8, 2025

Uh oh!

dyno Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

pvary Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

dyno Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

pvary Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

dyno commented Oct 9, 2025

Uh oh!

pvary commented Oct 10, 2025

Uh oh!

dyno commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pvary Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

dyno Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

pvary Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

pvary commented Oct 13, 2025

Uh oh!

dyno commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pvary commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dyno commented Oct 10, 2025 •

edited

Loading

dyno commented Oct 13, 2025 •

edited

Loading