forked from apache/datafusion
-
Notifications
You must be signed in to change notification settings - Fork 0
Avo/substrait map literals #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Blizzara
wants to merge
23
commits into
avo/substrait-type-extensions
from
avo/substrait-map-literals
Closed
Avo/substrait map literals #3
Blizzara
wants to merge
23
commits into
avo/substrait-type-extensions
from
avo/substrait-map-literals
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* get expr planners when creating new planner Signed-off-by: jayzhan211 <[email protected]> * get expr planner when creating planner Signed-off-by: jayzhan211 <[email protected]> * no planners in sqltorel Signed-off-by: jayzhan211 <[email protected]> * Add docs about SessionContextProvider * Use Slice rather than Vec to access expr planners * add test Signed-off-by: jayzhan211 <[email protected]> * clippy Signed-off-by: jayzhan211 <[email protected]> --------- Signed-off-by: jayzhan211 <[email protected]> Co-authored-by: Andrew Lamb <[email protected]>
* Add dialect param to use CHAR instead of TEXT for Utf8 unparsing for MySQL (apache#12) * Configurable data type instead of flag for Utf8 unparsing * Fix type in comment
* implement retract_batch for xor accumulator * add comment
…arquetWriterOptions` (apache#11444) * refactor: make more explicit the relationship btwn TableParquetOptions vs ParquetOptions vs WriterProperties * test: demonstrate the relationship btwn session configs and writer props * refactor: move parquet-format specific functionality to the parquet submodule, leaving only the config options in the config module. * test: update test fixtures to use the ParquetOptions::default * test: update test helper session_config_from_writer_props, to not add column configuration when none exists * test(11367): write test to demonstrate issue 11367 * fix: existing sqllogictests require specific ParquetOptions settings to be left as None * test(11367): demonstrate how the require bloom filter defaults, (required to avoid test regression), result in different default behavior than parquet crate * chore: make more reviewable, by pulling tests for issue 11367 into followup PR * refactor: move all parquet-associated features into parquet-writer mod * chore: better function naming convention
* feat: add count empty rewrite * feat: make count support zero args * docs: add apache license * tests: make count() valid * tests: more tests * refactor: sketch `AggregateFunctionPlanner` * refactor: cleanup `AggregateFunctionPlanner` * feat: add back rule * Revert "feat: add back rule" This reverts commit 2c4fc0a. * Revert "refactor: cleanup `AggregateFunctionPlanner`" This reverts commit 4550dbd. * Revert "refactor: sketch `AggregateFunctionPlanner`" This reverts commit 658671e. * Apply suggestions from code review Co-authored-by: Andrew Lamb <[email protected]> * refactor: PR feedback * style: fix indent --------- Co-authored-by: Andrew Lamb <[email protected]>
…leParquetOptions` (apache#11524) * test(11367): define current behavior of parquet writer configuration defaults * chore(11367): update code comments to make it more explicit on the mismatches
* Create datafusion-physical-optimizer crate * fmt .md * Update Cargo.lock in datafusion-cli * fmt toml and fix doc
…pache#11491) * Revert "Support `NULL` literals in where clause (apache#11266)" This reverts commit fa01917. * Followup Support NULL literals in where clause * misc err change * adopt comparison_coercion * Fix comments * Fix comments
…ache#11483) * Update the parquet code prune_pages_in_one_row_group to use the `StatisticsExtractor` * fix doc * Increase evaluation error counter if error determining data page row counts * Optimize `single_column`
* Enable LeftAnti filtered fuzz tests * Enable LeftAnti filtered fuzz tests. Add git reference
… physical-expr, and physical-plan packages (apache#11538)
Signed-off-by: jayzhan211 <[email protected]>
* Minor: avoid a clone in type coercion * Fix test
* Add input_nullable to UDAF args StateField/AccumulatorArgs This follows how it done for input_type and only provide a single value. But might need to be changed into a Vec in the future. This is need when we are moving `arrag_agg` to udaf where one of the states nullability will depend on the nullability of the input. * Make ArragAgg (not ordered or distinct) into a UDAF * Add roundtrip_expr_api test case * Address PR comments * Propegate input nullability for aggregates * Remove from accumulator args * first draft Signed-off-by: jayzhan211 <[email protected]> * cleanup Signed-off-by: jayzhan211 <[email protected]> * fix test Signed-off-by: jayzhan211 <[email protected]> * distinct Signed-off-by: jayzhan211 <[email protected]> * fix Signed-off-by: jayzhan211 <[email protected]> * address comment Signed-off-by: jayzhan211 <[email protected]> --------- Signed-off-by: jayzhan211 <[email protected]> Co-authored-by: Emil Ejbyfeldt <[email protected]>
* move make_map to ExprPlanner * add benchmark for make_map * remove todo comment * update lock * refactor plan_make_map * implement make_array_strict for type checking strictly * fix planner provider * roll back to `make_array` * update lock
0d3ca4b
to
31d914d
Compare
31d914d
to
b820936
Compare
Blizzara
pushed a commit
that referenced
this pull request
Mar 6, 2025
…4544) * add mut annotation * fix rust examples * fix rust examples * update * fix first doctest * fix first doctest * fix more doctest * fix more doctest * fix more doctest * adopt rustdoc syntax * adopt rustdoc syntax * adopt rustdoc syntax * fix more doctest * add missing imports * final udtf * reenable * remove dep * run prettier * api-health * update doc * update doc * temp fix * fix doc * fix async schema provider * fix async schema provider * fix doc * fix doc * reorder * refactor * s * finish * minor update * add missing docs * add deps (#3) * fix doctest * update doc * fix doctest * fix doctest * tweak showkeys * fix doctest * fix doctest * fix doctest * fix doctest * update to use user_doc * add rustdoc preprocessing * fix dir * revert to original doc * add allocator * mark type * update * fix doctest * add doctest * add doctest * fix doctest * fix doctest * fix doctest * fix doctest * fix doctest * fix doctest * fix doctest * fix doctest * fix doctest * prettier format * revert change to datafusion-testing * add apache header * install cmake in setup-builder for ci workflow dependency * taplo + fix snmalloc * Update function docs * preprocess user-guide * Render examples as sql * fix intro * fix docs via script --------- Co-authored-by: Andrew Lamb <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
core
documentation
Improvements or additions to documentation
logical-expr
optimizer
physical-expr
sql
sqllogictest
substrait
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Related to Map epic apache#11434
Rationale for this change
Substrait didn't support Map literals, since we previously didn't have Map ScalarValues. apache#11224 implemented ScalarValues (thanks!), so now we can add them into Substrait as well.
There were also couple gaps left by the implementation that I encountered while testing this, so I fixed those as well.
Also, there was a bug in the from_substrait_literal for lists containing structs, which I realized while implementing the map support.
What changes are included in this PR?
ScalarValue::iter_to_array
andcreate_hashes
Are these changes tested?
Tested with new roundtrip tests for the Map literals, and unit tests for hashing.
Also an existing UT for list literals is extended to cover the multiple structs values case.
I'd have added a sql roundtrip test for Map as well, but it seems that the
MAP
command turns into a ScalarFunction rather than ScalarValue. Maybe we'd need to run some round of optimizer to fold it, but I wonder why that is different from theSTRUCT
command which does turn into a struct literal?I.e. doing
roundtrip("VALUES (MAP(['k1', 'k2'], [true, CAST(NULL AS BOOLEAN)]))").await?;
results inError: Substrait("Only literal types can be aliased in Virtual Tables, got: ScalarFunction")
, while similar code for STRUCT works fine.Are there any user-facing changes?