-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Is your feature request related to a problem or challenge?
When running sort queries with large LIMIT
and small memory limit, it can go out of memory
select *
from tbl
order by c1
limit 1000000000
It's possible to enable spilling capability inside TopK
data structure to let such queries complete.
Describe the solution you'd like
Approach 1
Add back fetch
field back to ExternalSorter
to support external TopK queries, which is removed in #15525 (because it's unused now)
This approach can be slightly faster, but requires to add a configuration option to switch to ExternalSorter
path for large LIMIT
, instead of the default TopK
path. (or let optimizer figure out when to switch automatically, though it's also tricky)
Approach 2
Add spilling capability inside TopK executor. When the memory limit is reached, it can fallback to out-of-core execution without introducing a new configuration.
Describe alternatives you've considered
No response
Additional context
The sort + limit query is usually run with a small LIMIT
count, so it's mostly memory-efficient.
@alamb is referring to this issue as a sort of exploratory idea, so perhaps someone with real usage knows better how to get it implemented 🤔