Skip to content

[Bug] Aggregation Request Generates Incorrect Command Arguments for LIMIT  #2383

@lgnashold

Description

@lgnashold

Version: redis-py 4.3.4, Redis 7.0

Platform: Python 3.8 Ubuntu

Description:
On the current master branch, it says in comments that if limit is applied before group by, it will apply the limit to the initial query. If it is applied after group by, it will be applied to the result of the aggregation. Based on the example, you can also specify a limit at multiple stages in the pipeline. However, in practice, the limit is only set on the last stage of the aggregation.

For example, given the following code on a very large database

AggregateRequest("*").limit(0, 100).group_by('@my_field', reducers.count()).limit(0, 1000)

You would expect at most 100 results to be returned, since the first limit should limit the amount of rows the aggregation request queries initially. However, 1000 results are returned, and based on runtime, the original query isn't limited at all.

Why it happens:
This happens because AggregateRequest generates an incorrect list of arguments for LIMIT. Given the above query, you would expect the following arguments to be generated:

['LIMIT', '0', '100', 'GROUPBY', '1', '@future_type', 'REDUCE', 'COUNT', '0', 'LIMIT', '0, '1000']

I have checked that running FT Aggregate with the above command produces the expected result. However, the arguments that are actually generated are

['*', 'GROUPBY', '1', '@future_type', 'REDUCE', 'COUNT', '0', 'LIMIT', '0', '1000']

Note that there is only one limit in the arguments. The reason incorrect arguments are generated is because when we call .limit() on an AggregationRequest, it overwrites any limit previously set (at this location). Furthermore, when we generate the arguments, we always append the LIMIT argument to the end of the query (line 326). The expected behavior would be to insert the command arguments for LIMIT in corresponding location in the aggregation request.

Fix Suggestion:
I'm no expert in redis-py, but after looking at the code, I believe we would want to replace line 214 in aggregations.py:

        self._limit = Limit(offset, num)

with the following:

_limit = Limit(offset, num)
self._aggregateplan.extend(_limit.build_args())

And remove lines 107 and 326.

I believe at minimum, the documentation should be updated so that it reflects the current incorrect behavior

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions