-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Is your feature request related to a problem or challenge?
There are some features, such as table sampling, that are challenging to add to DataFusion
The reason it is challenging is that I think the usecase and semantics will vary widely across systems, and thus I worry that anything we build into DataFusion will likely be fairly complicated as well as not what other systems may want.
I think it is actually possible to implement table sampling with the existing APIS through a combination of
- sql planner extension https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/sql_dialect.rs
- User defined extension nodes (aka add extension logical planning nodes)
I would be willing to help make an example for this usecase, to show it is possible. I think it would be a nice showcase for how to extend systems using DataFusion without having to change the ecod
Describe the solution you'd like
A clear and well documented example of extending the SQL supported by DataFusion
Describe alternatives you've considered
Note that @theirix already has a great start here
I would like to assist completing this project
Additional context
Related ticket
- Support data source sampling with TABLESAMPLE #13563 from @theirix
- The related PR from @theirix in feat: Add example for implementing
SAMPLE
using extension APIs #17633 - A blog post on the subject Blog on Extending SQL to create own SQL Dialects datafusion-site#97 from @Adez017