-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
As discussed on discord, here's another external sort usecase that's failing.
Repro:
https://github.com/ivankelly/df-repro
To run:
$ bash setup.sh # download the source data
$ RUST_LOG=trace cargo run
...
Error: Resources exhausted: Failed to allocate additional 1450451 bytes for ParquetSink(ArrowColumnWriter) with 62770337 bytes already allocated for this reservation - 1107184 bytes remain available for the total pool
The code reads in a bunch of parquet files (889MB in total) and tries to sort and output to a single parquet file.
Memory is limited to 100MB.
Different batch sizes and target partitions doesn't help.
To Reproduce
No response
Expected behavior
No response
Additional context
No response
2010YOUY01
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working