-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Open
Labels
EPICA larger project, actively underway, with sub tasksA larger project, actively underway, with sub tasksenhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem or challenge?
Many DataFusion users are using DataFusion to execution workloads originally developed for Apache Spark. Examples include
- DataFusion Comet (@andygrove @comphead , etc)
- LakeHQ / Sail (@shehabgamin )
- Various internal pileines / engines (e.g. that @Omega359 and I think @Blizzara use)
They often do this for superior performance
- Part of running Spark workloads is emulating Spark sematics
- Emulating Spark semantics requires (among other things) functions compatible with Spark (which differs in semantics to the functions included in DataFusion)
Several projects are in the process of implementing Spark compatible function libraries using DataFusion's extension APIs. However. we concluded in #5600 that we could join forces and maintain a spark compatible funciton library in the core datafusion repo. @shehabgamin has implemented the first step in #15168 🙏
Describe the solution you'd like
This ticket tracks "completing" the spark function library started in #15168
Describe alternatives you've considered
datetime
functions:
- [datafusion-spark] Implement Spark
datetime
functionlast_day
#16774 - [datafusion-spark] Implement Spark
date
functionnext_day
#16775
string
functions:
math
functions:
Infrastructure and Testing:
- [DISCUSSION] Add separate crate to cover spark builtin functions #5600
- feat: Add
datafusion-spark
crate #15168 - [datafusion-spark] Example of using Spark compatible function library #15915
- feat: Support test spark runner in
datafusion-spark
for slt tests #17045
Related issues
- [datafusion-spark] Test integrating datafusion-spark code into comet datafusion-comet#1704
- [EPIC] Implement expressions as ScalarUDFImpl datafusion-comet#1819
- Spark-compatible CAST operation #11201
- SparkSha2 is not compliant with Spark and does not support Int32 type #16336
- Add xxhash algorithms in SQL and expression api #14367
Additional context
No response
andygrove, shehabgamin, Omega359, Adez017, cht42 and 3 more
Metadata
Metadata
Assignees
Labels
EPICA larger project, actively underway, with sub tasksA larger project, actively underway, with sub tasksenhancementNew feature or requestNew feature or request