Skip to content

External aggregation reserves more memory than actual usage #13089

Closed
@2010YOUY01

Description

@2010YOUY01

Describe the bug

The below query requires 65M memory to run, if we set memory limit to 50M, it can not run successfully
Run in datafusion-cli:

cargo run -- --mem-pool-type fair -m 50M -c "
select t1.v1,  sum(t2.v1)
from
unnest(generate_series(1,1000)) as t1(v1)
, unnest(generate_series(1,1000)) as t2(v1)
group by t1.v1, t2.v1"

Error: External error: Resources exhausted: Failed to allocate additional 47616 bytes for GroupedHashAggregateStream[0] with 3995896 bytes already allocated for this reservation - 4031073 bytes remain available for the total pool

The issue is when doing sort-merge memory usage is over-estimated

self.reservation.try_grow(batch.get_array_memory_size())?;

For example, a RecordBatch with 3 arrays, arrays are sharing the same buffers, record_batch.get_array_memory_size() will estimate 3X actual memory consumption.
(The original RecordBatches passing through datafusion operators don't share Buffer between different columns, but in spilling queries, RecordBatches are first written to disk and read back, then it will reuse Buffers among different column arrays)

The root cause is already reported in arrow-rs apache/arrow-rs#6363
Once it's fixed in the arrow we should check if this aggregation query can run successfully, and also add tests.

To Reproduce

No response

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions