Skip to content

Writing parquet fails modest memory limit #15028

@ivankelly

Description

@ivankelly

Describe the bug

As discussed on discord, here's another external sort usecase that's failing.

Repro:
https://github.com/ivankelly/df-repro

To run:

$ bash setup.sh # download the source data
$ RUST_LOG=trace cargo run
...
Error: Resources exhausted: Failed to allocate additional 1450451 bytes for ParquetSink(ArrowColumnWriter) with 62770337 bytes already allocated for this reservation - 1107184 bytes remain available for the total pool

The code reads in a bunch of parquet files (889MB in total) and tries to sort and output to a single parquet file.
Memory is limited to 100MB.
Different batch sizes and target partitions doesn't help.

To Reproduce

No response

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions