Skip to content

[datafusion-spark] Example of using Spark compatible function library #15915

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

@shehabgamin added the datafusion-spark crate in #15168

The idea is that using the functions in this crate you can get a SessionContext that executes sql using Spark semantics. However, there is no user facing documentation that shows how to do this

Describe the solution you'd like

Add an example somewhere showing how to configure and use the spark functions in a SessionContext. I can help with this

Describe alternatives you've considered

I personally suggest adding a new page to the website: https://datafusion.apache.org/

Specifically, I suggest

  1. Add a new page in the "Library User Guide" called "Spark Compatible Functions"
  2. Add a preamble explaining what the datafusion-spark crate is (contains a list of spark compatible functions)
  3. Add examples

For example we should show how to run sql using a "spark compatible" frame:

let ctx = SessionContext::new();
datafusion_spark::register_all(&ctx)?;

// TODO run an example SQL query here that uses a function from 
// the datafusion spark crate
ctx.sql("select ... ")

// also add an example for DataFrame API

In order to run the example code as part of CI, you will have to add an entry such as this:

#[cfg(doctest)]
doc_comment::doctest!(
"../../../docs/source/user-guide/introduction.md",
user_guide_introduction
);

to the datafusion-spark lib.rs file (it can't go in the datafusion/core/lib.rs because the core crate doesn't bring in datafusion-spark)

Additional context

No response

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentationenhancementNew feature or requestgood first issueGood for newcomers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions