-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Introduction
This ticket is a weekly summary of interesting things happening in DataFusion. Note this is not a complete list (it is what I remember / can find). Please feel free to leave comments on this ticket about things that I may have missed or you think should get wider attention by the community
Loosely inspired by https://this-week-in-rust.org/
Highlights from last week(s):
(I am sorry if I missed you -- please add a note to this ticket with anything you would like to add)
- @goldmedal became a committer 🎉
- @peasee, @sgrebnov @goldmedal @phillipleblanc continue to make the plan --> SQL text first class fix: Dialect requires derived table alias #12994 feat(substrait): add wildcard handling to producer #12987 refactor(substrait): refactor ReadRel consumer #12983 and more
- @duongcongtoai cleaned up
unnest
even more: Improve recursiveunnest
options API #12836 - @akurmustafa made several PRs to keep the code clean like [MINOR]: Use arrow take_arrays, remove datafusion take_arrays #13013
- @eejbyfeldt pushed ahead getting all of TPCDS working feat: Decorrelate more predicate subqueries #12945 and fixed bugs Fix 2 bugs related to push down partition filters #12902
- @tokoko @Blizzara @vbarua and @westonpace keep on improving substrait feat(substrait): add set operations to consumer, update substrait to
0.45.0
#12863 - @Omega359 added new regexp functions feat: Add regexp_count function #12970
- @askalt Improve performance for physical plan creation with many columns #12950
- @jcsherin kept the momentum up on converting window functions: Convert
BuiltInWindowFunction::{Lead, Lag}
to a user defined window function #12857 - @peter-toth began implementing CSE for physical plans; Implement physical optimizer rule for common subexpression elimination #12599
- @comphead is making Sort-Merge-Join ready for real: Move SMJ join filtered part out of join_output stage. LeftOuter, LeftSemi #12764
- @Dandandan is hacking on simplifying joins: Remove logical cross join in planning #12985
Looking to get more involved? Try code review!
DataFusion has a long history of community members contributing in all aspects of the project. Reviewing PRs is an especially great way to get introduced to the project, help the community and grow your own knowledge -- researching and understanding the code enough to review PRs also often inspires additional ideas for improvements.
We have docs about reviews. TLDR is: look for test coverage, if the change is understandable and well documented, and if the code can be improved. When you think the PR looks good to merge, try @
mentioning one of the committers.
Help wanted
Please feel leave your own comments on the ticket if you are looking for help
- Reviews (especially for committers) to help @notfillipo and @findepi on logical types
- Reviews of new FFI interface FFI initial implementation #12920 from @timsaucer
- Committer review / approval of Migrate documentation for
regr*
aggregate functions to code #12871
Andrew's Focus Areas:
We are preparing for the 43.0.0 release and I am personally pretty excited about (and thus actively help / put to the top of my review list):
- [DISCUSSION] Make DataFusion the fastest engine for querying parquet data in ClickBench #12821 (thanks to the epic work of @Rachelint, @goldmedal, @jayzhan211, @Dandandan @XiangpengHao and others, we are quite close)
- [Epic] Unify
WindowFunction
Interface (remove built in list ofBuiltInWindowFunction
s) #8709 (very close to finishing thanks @jcsherin @jatin510) - [EPIC] Automatically generate all function documentation from code #12740 (Thanks to @jonathanc-n)
- Aggregation fuzz testing #12114 (thanks @Rachelint for all your help so far)
- [DISCUSSION]: move sqlparser to Apache (DataFusion) governance datafusion-sqlparser-rs#1294
Recent and Upcoming Releases
- Release DataFusion 42.1.0 #12813 (thanks @Xuanwo and @matthewmturner)
- Release sqlparser-rs version
0.52.0
datafusion-sqlparser-rs#1423 (huge kudos to @iffyio for all the reviews) - Release arrow-rs / parquet minor version
53.2.0
(~November~ October 2024) arrow-rs#6341 (to support turning on string view by default) - Release DataFusion 43.0.0 #12470 (thanks @andygrove)
Interesting discussions underway:
- 2024 Q3-Q4 Roadmap? #11442
- [DISCUSS] Document criteria for adding new features / what belongs in core DataFusion (e.g. sql syntax, functions, etc) #12357
Community
- Weekly Call
- Slack/Discord: info links
Upcoming meetups:
- Oct 14 Seattle: https://lu.ma/tnwl866b @phillipleblanc @likekim
- Dec 18 Chicago: https://lu.ma/eq5myc5i @adriangb @timsaucer
- TBD: DISCUSSION: January 2025 DataFusion Meetup in Amsterdam / CIDR 2025 #12988
Background:
Previous update: #12973