-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Is your feature request related to a problem or challenge?
@shehabgamin added the datafusion-spark
crate in #15168
The idea is that using the functions in this crate you can get a SessionContext
that executes sql using Spark semantics. However, there is no user facing documentation that shows how to do this
Describe the solution you'd like
Add an example somewhere showing how to configure and use the spark functions in a SessionContext. I can help with this
Describe alternatives you've considered
I personally suggest adding a new page to the website: https://datafusion.apache.org/
Specifically, I suggest
- Add a new page in the "Library User Guide" called "Spark Compatible Functions"
- Add a preamble explaining what the datafusion-spark crate is (contains a list of spark compatible functions)
- Add examples
For example we should show how to run sql using a "spark compatible" frame:
let ctx = SessionContext::new();
datafusion_spark::register_all(&ctx)?;
// TODO run an example SQL query here that uses a function from
// the datafusion spark crate
ctx.sql("select ... ")
// also add an example for DataFrame API
In order to run the example code as part of CI, you will have to add an entry such as this:
datafusion/datafusion/core/src/lib.rs
Lines 928 to 932 in 81b4c07
#[cfg(doctest)] | |
doc_comment::doctest!( | |
"../../../docs/source/user-guide/introduction.md", | |
user_guide_introduction | |
); |
to the datafusion-spark lib.rs file (it can't go in the datafusion/core/lib.rs because the core crate doesn't bring in datafusion-spark)
Additional context
No response