Skip to content

listagg support? #169

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
maxcountryman opened this issue May 26, 2020 · 10 comments · Fixed by #174
Closed

listagg support? #169

maxcountryman opened this issue May 26, 2020 · 10 comments · Fixed by #174
Labels
good first issue Good for newcomers

Comments

@maxcountryman
Copy link
Contributor

It seems the parser doesn't currently support listagg. For instance, the following Redshift example fails to parse:

select listagg(sellerid) 
within group (order by dateid) as sellers,
listagg(dateid) as dates
from winsales;
@nickolay nickolay added the good first issue Good for newcomers label May 26, 2020
@nickolay
Copy link
Contributor

You're right, and PRs are welcome!

According to the spec, the syntax seems to be:

LISTAGG( [  { DISTINCT | ALL } ]
            <expression>, 'separator'
            [ ON OVERFLOW
                { ERROR
                | TRUNCATE [ 'filler' ] WITH[OUT] COUNT } ]
        ) WITHIN GROUP (ORDER BY <OrderByExpr> [, ...])

@maxcountryman
Copy link
Contributor Author

You're right, and PRs are welcome!

According to the spec, the syntax seems to be:

LISTAGG( [  { DISTINCT | ALL } ]
            <expression>, 'separator'
            [ ON OVERFLOW
                { ERROR
                | TRUNCATE [ 'filler' ] WITH[OUT] COUNT } ]
        ) WITHIN GROUP (ORDER BY <OrderByExpr> [, ...])

Should this be an extension to the Expr enum or its own node type altogether?

@nickolay
Copy link
Contributor

Well, #40 made the point that the more complex enum variants are more convenient to work with when they wrap a separate type, like Expr::Function does. In simpler cases like UnaryOp { op: UnaryOperator, expr: Box<Expr> } we use some inlining.. So it mainly depends on the approach you want to take here.

The most straightforward approach would be to add an Expr::ListAgg, treating it as a unique case, like we do for EXTRACT(), but with a corresponding type since it would have a number of members unlike EXTRACT. That would have some overlap with other aggregate functions, but coming up with a more generic solution would require more research.

@maxcountryman
Copy link
Contributor Author

Well, #40 made the point that the more complex enum variants are more convenient to work with when they wrap a separate type, like Expr::Function does. In simpler cases like UnaryOp { op: UnaryOperator, expr: Box<Expr> } we use some inlining.. So it mainly depends on the approach you want to take here.

The most straightforward approach would be to add an Expr::ListAgg, treating it as a unique case, like we do for EXTRACT(), but with a corresponding type since it would have a number of members unlike EXTRACT. That would have some overlap with other aggregate functions, but coming up with a more generic solution would require more research.

Makes sense. I started implementing this as an entirely new type a la TOP but let me know if you think that doesn’t sound like the right approach.

@nickolay
Copy link
Contributor

nickolay commented May 26, 2020

A new type is fine, the question is how it integrates with the existing AST types.

TOP is not a good analogy in this regard, as it's referred to via a top: Option<Top> member on Select, meaning there can be zero or one TOP statement in a SELECT.

listagg, I believe, can be a part of a string expression, and so must be referred to from Expr somehow. (Your own example has 2 instances of listagg in a single SELECT.) In my previous comment I took this as a given and was talking about the various ways this can be achieved.

@nickolay
Copy link
Contributor

By the way, I just noticed your example had this: listagg(dateid) as dates. The spec says WITHIN GROUP (ORDER BY ...) is mandatory, which makes sense. Is this a typo? If not, please make sure to mention which dialect supports omitting it and including a test for it.

@maxcountryman
Copy link
Contributor Author

My plan was to tie this new type into Expr (TOP doesn’t make much sense as an example, aside from the fact it’s its own type, oops); much like the function implementation you pointed out.

That is not a typo, that’s valid Redshift (and I would assume Postgres) SQL.

@nickolay
Copy link
Contributor

nickolay commented May 26, 2020

Sure, if you just meant Top to be an example of a separate type, that's cool.

Thanks for confirming that Redshift decided to make WITHIN GROUP optional, and the delimiter too, it appears:

LISTAGG( [DISTINCT] aggregate_expression [, 'delimiter' ] ) 
[ WITHIN GROUP (ORDER BY order_list) ]   

( https://modern-sql.com/feature/listagg doesn't list Postgres a supporting LISTAGG at all yet.)

@maxcountryman
Copy link
Contributor Author

Interesting—I didn’t realize Postgres doesn’t support it.

It seems like there’s not yet a Redshift dialect in the codebase. I’d be interested in this since that’s our primary use case but happy to address that separately if that makes sense to you.

@nickolay
Copy link
Contributor

Yes, if you're willing to discuss a Redshift dialect, let's do it in a separate thread. At this time adding a dialect is not a prerequisite for handling its quirks, as the active dialect currently can affect the tokenizer only, not the parser. Just make a note about the quirks you choose to implement (example).

maxcountryman added a commit to maxcountryman/sqlparser-rs that referenced this issue May 29, 2020
This patch provides an initial implemenation of LISTAGG[1]. Notably this
implemenation deviates from ANSI SQL by allowing both WITHIN GROUP and
the delimiter to be optional. We do so because Redshift SQL works this
way and this approach is ultimately more flexible.

Fixes apache#169.

[1] https://modern-sql.com/feature/listagg
maxcountryman added a commit to maxcountryman/sqlparser-rs that referenced this issue May 29, 2020
This patch provides an initial implemenation of LISTAGG[1]. Notably this
implemenation deviates from ANSI SQL by allowing both WITHIN GROUP and
the delimiter to be optional. We do so because Redshift SQL works this
way and this approach is ultimately more flexible.

Fixes apache#169.

[1] https://modern-sql.com/feature/listagg
maxcountryman added a commit to maxcountryman/sqlparser-rs that referenced this issue May 29, 2020
This patch provides an initial implemenation of LISTAGG[1]. Notably this
implemenation deviates from ANSI SQL by allowing both WITHIN GROUP and
the delimiter to be optional. We do so because Redshift SQL works this
way and this approach is ultimately more flexible.

Fixes apache#169.

[1] https://modern-sql.com/feature/listagg
maxcountryman added a commit to maxcountryman/sqlparser-rs that referenced this issue May 29, 2020
This patch provides an initial implemenation of LISTAGG[1]. Notably this
implemenation deviates from ANSI SQL by allowing both WITHIN GROUP and
the delimiter to be optional. We do so because Redshift SQL works this
way and this approach is ultimately more flexible.

Fixes apache#169.

[1] https://modern-sql.com/feature/listagg
maxcountryman added a commit to maxcountryman/sqlparser-rs that referenced this issue May 29, 2020
This patch provides an initial implemenation of LISTAGG[1]. Notably this
implemenation deviates from ANSI SQL by allowing both WITHIN GROUP and
the delimiter to be optional. We do so because Redshift SQL works this
way and this approach is ultimately more flexible.

Fixes apache#169.

[1] https://modern-sql.com/feature/listagg
maxcountryman added a commit to maxcountryman/sqlparser-rs that referenced this issue May 30, 2020
This patch provides an initial implemenation of LISTAGG[1]. Notably this
implemenation deviates from ANSI SQL by allowing both WITHIN GROUP and
the delimiter to be optional. We do so because Redshift SQL works this
way and this approach is ultimately more flexible.

Fixes apache#169.

[1] https://modern-sql.com/feature/listagg
nickolay pushed a commit that referenced this issue May 30, 2020
This patch provides an initial implemenation of LISTAGG[1]. Notably this
implemenation deviates from ANSI SQL by allowing both WITHIN GROUP and
the delimiter to be optional. We do so because Redshift SQL works this
way and this approach is ultimately more flexible.

Fixes #169.

[1] https://modern-sql.com/feature/listagg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants