Skip to content

Support spilling in TopK queries #15538

@2010YOUY01

Description

@2010YOUY01

Is your feature request related to a problem or challenge?

When running sort queries with large LIMIT and small memory limit, it can go out of memory

select *
from tbl
order by c1
limit 1000000000

It's possible to enable spilling capability inside TopK data structure to let such queries complete.

Describe the solution you'd like

Approach 1

Add back fetch field back to ExternalSorter to support external TopK queries, which is removed in #15525 (because it's unused now)
This approach can be slightly faster, but requires to add a configuration option to switch to ExternalSorter path for large LIMIT, instead of the default TopK path. (or let optimizer figure out when to switch automatically, though it's also tricky)

Approach 2

Add spilling capability inside TopK executor. When the memory limit is reached, it can fallback to out-of-core execution without introducing a new configuration.

Describe alternatives you've considered

No response

Additional context

The sort + limit query is usually run with a small LIMIT count, so it's mostly memory-efficient.
@alamb is referring to this issue as a sort of exploratory idea, so perhaps someone with real usage knows better how to get it implemented 🤔

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions