Skip to content

DOC-734 | AQL optimization: COLLECT ... AGGREGATE can utilize persistent index #732

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Simran-B
Copy link
Contributor

@Simran-B Simran-B commented Jul 1, 2025

Description

TODO:

  • Is the performance benefit higher the fewer distinct values there are?
  • Is the optimization skipped if there too many different values (low selectivity)?
  • Other limitations, like sparse indexes or pre/post sort?

Upstream PRs

  • 3.10:
  • 3.11:
  • 3.12:
  • 3.13:

@Simran-B Simran-B self-assigned this Jul 1, 2025
Copy link
Contributor

Deploy Preview Available Via
https://deploy-preview-732--docs-hugo.netlify.app

@Simran-B Simran-B added this to the 3.12.5 milestone Jul 1, 2025
@cla-bot cla-bot bot added the cla-signed label Jul 1, 2025
@Simran-B Simran-B changed the title AQL optimization: COLLECT ... AGGREGATE can utilize persistent index DOC-734 | AQL optimization: COLLECT ... AGGREGATE can utilize persistent index Jul 1, 2025
@Simran-B Simran-B requested a review from jvolmer July 1, 2025 16:10
@jvolmer
Copy link

jvolmer commented Jul 9, 2025

To your questions:

Is the performance benefit higher the fewer distinct values there are?

We saw performance gains of around two for different numbers of values (n) and low number of distinct values (k). This gain stays constant for a lot of k values, decreases when k comes close to n and is zero when k=n. In the cluster the gain decrease happens already for lower k than in the single server case. (this behaviour can be seen in the diagrams in arangodb/arangodb#21617)

Is the optimization skipped if there too many different values (low selectivity)?

No, this optimization is not skipped based on the selectivity value - opposed to the usage of use-index-for-collect for a collect without an aggregation.

Other limitations, like sparse indexes or pre/post sort?

Yes, the optimization does not support sparse indexes, aggregation expressions with variables different than the document variable and aggregation expressions with no in-variable. I'm not sure what you mean with pre/post sort.

Comment on lines +1269 to +1270
Reading the data from the index instead of the stored documents for aggregations
can significantly increase the perform if the there are few different values.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Reading the data from the index instead of the stored documents for aggregations
can significantly increase the perform if the there are few different values.
Reading the data from the index instead of the stored documents for aggregations
can increase the performance by a factor of two.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants