Skip to content

feat: Supporting SAMPLE parsing #1566

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

seve-martinez
Copy link
Contributor

Snowflake, Databricks, Postgres, and others all support some from of table sampling.

This commonly takes on the form

SELECT ...
FROM ... 
{ SAMPLE | TABLESAMPLE } [ samplingMethod ]

where

samplingMethod ::= { { BERNOULLI | ROW } ( { <probability> | <num> ROWS } ) |
                     { SYSTEM | BLOCK } ( <probability> ) [ { REPEATABLE | SEED } ( <seed> ) ] }

Different dialects support one, the other, or both sample keywords and only some of the methods. The current PR follows the snowflake paradigm.

Outstanding questions

  • This method is not currently called anywhere. Review of the Parser code would make me think it should be part of the parse_table_factor() logic since it always trails a relation. However this would then require each variant of TableFactor that supports SAMPLE to have a placeholder for it. Is that the right approach?

@iffyio
Copy link
Contributor

iffyio commented Dec 6, 2024

Hi @seve-martinez! It sounds like the parsing can be done while parsing the select body instead? Probably here before looking for a WHERE clause?

@yoavcloud
Copy link
Contributor

I did not notice this PR when I started working on parsing the SAMPLE option, but since then it was merged: #1580

I can share that we used to parse the SAMPLE option as part of the TableFactor but it wasn't ideal, and moving it to the select made more sense.

@seve-martinez
Copy link
Contributor Author

I did not notice this PR when I started working on parsing the SAMPLE option, but since then it was merged: #1580

I can share that we used to parse the SAMPLE option as part of the TableFactor but it wasn't ideal, and moving it to the select made more sense.

Oh awesome! Thanks for taking care of it, I wasn't in love with my solution and what you have there looks very sound. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants