Avo/substrait map literals #3

Blizzara · 2024-07-18T13:57:09Z

Which issue does this PR close?

Related to Map epic apache#11434

Rationale for this change

Substrait didn't support Map literals, since we previously didn't have Map ScalarValues. apache#11224 implemented ScalarValues (thanks!), so now we can add them into Substrait as well.

There were also couple gaps left by the implementation that I encountered while testing this, so I fixed those as well.

Also, there was a bug in the from_substrait_literal for lists containing structs, which I realized while implementing the map support.

What changes are included in this PR?

Implement Substrait roundtrip support for Map literals, incl. null and empty maps.
Fix field name handling for Substrait lists containing structs
Add Map support to ScalarValue::iter_to_array and create_hashes

Are these changes tested?

Tested with new roundtrip tests for the Map literals, and unit tests for hashing.
Also an existing UT for list literals is extended to cover the multiple structs values case.

I'd have added a sql roundtrip test for Map as well, but it seems that the MAP command turns into a ScalarFunction rather than ScalarValue. Maybe we'd need to run some round of optimizer to fold it, but I wonder why that is different from the STRUCT command which does turn into a struct literal?

I.e. doing roundtrip("VALUES (MAP(['k1', 'k2'], [true, CAST(NULL AS BOOLEAN)]))").await?; results in Error: Substrait("Only literal types can be aliased in Virtual Tables, got: ScalarFunction"), while similar code for STRUCT works fine.

Are there any user-facing changes?

* get expr planners when creating new planner Signed-off-by: jayzhan211 <[email protected]> * get expr planner when creating planner Signed-off-by: jayzhan211 <[email protected]> * no planners in sqltorel Signed-off-by: jayzhan211 <[email protected]> * Add docs about SessionContextProvider * Use Slice rather than Vec to access expr planners * add test Signed-off-by: jayzhan211 <[email protected]> * clippy Signed-off-by: jayzhan211 <[email protected]> --------- Signed-off-by: jayzhan211 <[email protected]> Co-authored-by: Andrew Lamb <[email protected]>

* Add dialect param to use CHAR instead of TEXT for Utf8 unparsing for MySQL (apache#12) * Configurable data type instead of flag for Utf8 unparsing * Fix type in comment

* implement retract_batch for xor accumulator * add comment

…arquetWriterOptions` (apache#11444) * refactor: make more explicit the relationship btwn TableParquetOptions vs ParquetOptions vs WriterProperties * test: demonstrate the relationship btwn session configs and writer props * refactor: move parquet-format specific functionality to the parquet submodule, leaving only the config options in the config module. * test: update test fixtures to use the ParquetOptions::default * test: update test helper session_config_from_writer_props, to not add column configuration when none exists * test(11367): write test to demonstrate issue 11367 * fix: existing sqllogictests require specific ParquetOptions settings to be left as None * test(11367): demonstrate how the require bloom filter defaults, (required to avoid test regression), result in different default behavior than parquet crate * chore: make more reviewable, by pulling tests for issue 11367 into followup PR * refactor: move all parquet-associated features into parquet-writer mod * chore: better function naming convention

* feat: add count empty rewrite * feat: make count support zero args * docs: add apache license * tests: make count() valid * tests: more tests * refactor: sketch `AggregateFunctionPlanner` * refactor: cleanup `AggregateFunctionPlanner` * feat: add back rule * Revert "feat: add back rule" This reverts commit 2c4fc0a. * Revert "refactor: cleanup `AggregateFunctionPlanner`" This reverts commit 4550dbd. * Revert "refactor: sketch `AggregateFunctionPlanner`" This reverts commit 658671e. * Apply suggestions from code review Co-authored-by: Andrew Lamb <[email protected]> * refactor: PR feedback * style: fix indent --------- Co-authored-by: Andrew Lamb <[email protected]>

…leParquetOptions` (apache#11524) * test(11367): define current behavior of parquet writer configuration defaults * chore(11367): update code comments to make it more explicit on the mismatches

* Create datafusion-physical-optimizer crate * fmt .md * Update Cargo.lock in datafusion-cli * fmt toml and fix doc

)

…pache#11491) * Revert "Support `NULL` literals in where clause (apache#11266)" This reverts commit fa01917. * Followup Support NULL literals in where clause * misc err change * adopt comparison_coercion * Fix comments * Fix comments

…ache#11511)

…ache#11483) * Update the parquet code prune_pages_in_one_row_group to use the `StatisticsExtractor` * fix doc * Increase evaluation error counter if error determining data page row counts * Optimize `single_column`

* Enable LeftAnti filtered fuzz tests * Enable LeftAnti filtered fuzz tests. Add git reference

… physical-expr, and physical-plan packages (apache#11538)

Signed-off-by: jayzhan211 <[email protected]>

* Minor: avoid a clone in type coercion * Fix test

* Add input_nullable to UDAF args StateField/AccumulatorArgs This follows how it done for input_type and only provide a single value. But might need to be changed into a Vec in the future. This is need when we are moving `arrag_agg` to udaf where one of the states nullability will depend on the nullability of the input. * Make ArragAgg (not ordered or distinct) into a UDAF * Add roundtrip_expr_api test case * Address PR comments * Propegate input nullability for aggregates * Remove from accumulator args * first draft Signed-off-by: jayzhan211 <[email protected]> * cleanup Signed-off-by: jayzhan211 <[email protected]> * fix test Signed-off-by: jayzhan211 <[email protected]> * distinct Signed-off-by: jayzhan211 <[email protected]> * fix Signed-off-by: jayzhan211 <[email protected]> * address comment Signed-off-by: jayzhan211 <[email protected]> --------- Signed-off-by: jayzhan211 <[email protected]> Co-authored-by: Emil Ejbyfeldt <[email protected]>

* move make_map to ExprPlanner * add benchmark for make_map * remove todo comment * update lock * refactor plan_make_map * implement make_array_strict for type checking strictly * fix planner provider * roll back to `make_array` * update lock

…4544) * add mut annotation * fix rust examples * fix rust examples * update * fix first doctest * fix first doctest * fix more doctest * fix more doctest * fix more doctest * adopt rustdoc syntax * adopt rustdoc syntax * adopt rustdoc syntax * fix more doctest * add missing imports * final udtf * reenable * remove dep * run prettier * api-health * update doc * update doc * temp fix * fix doc * fix async schema provider * fix async schema provider * fix doc * fix doc * reorder * refactor * s * finish * minor update * add missing docs * add deps (#3) * fix doctest * update doc * fix doctest * fix doctest * tweak showkeys * fix doctest * fix doctest * fix doctest * fix doctest * update to use user_doc * add rustdoc preprocessing * fix dir * revert to original doc * add allocator * mark type * update * fix doctest * add doctest * add doctest * fix doctest * fix doctest * fix doctest * fix doctest * fix doctest * fix doctest * fix doctest * fix doctest * fix doctest * prettier format * revert change to datafusion-testing * add apache header * install cmake in setup-builder for ci workflow dependency * taplo + fix snmalloc * Update function docs * preprocess user-guide * Render examples as sql * fix intro * fix docs via script --------- Co-authored-by: Andrew Lamb <[email protected]>

jayzhan211 and others added 13 commits July 17, 2024 06:29

Support alternate format for Utf8 unparsing (CHAR) (apache#11494)

b0925c8

* Add dialect param to use CHAR instead of TEXT for Utf8 unparsing for MySQL (apache#12) * Configurable data type instead of flag for Utf8 unparsing * Fix type in comment

implement retract_batch for xor accumulator (apache#11500)

fb34ef2

* implement retract_batch for xor accumulator * add comment

chore: fix typos of common and core packages (apache#11520)

81a6094

Move spill related functions to spill.rs (apache#11509)

adcfd85

Add tests that show the different defaults for ArrowWriter and `Tab…

4b840c0

…leParquetOptions` (apache#11524) * test(11367): define current behavior of parquet writer configuration defaults * chore(11367): update code comments to make it more explicit on the mismatches

Create datafusion-physical-optimizer crate (apache#11507)

58da969

* Create datafusion-physical-optimizer crate * fmt .md * Update Cargo.lock in datafusion-cli * fmt toml and fix doc

Minor: Assert test_enabled_backtrace requirements to run (apache#11525

1b9b35c

)

Minor: Clarify which parquet options are used for reading/writing (ap…

dff2f3c

…ache#11511)

github-actions bot added the substrait label Jul 18, 2024

comphead and others added 7 commits July 18, 2024 09:53

Enable SortMergeJoin LeftAnti filtered fuzz tests (apache#11535)

be130b4

* Enable LeftAnti filtered fuzz tests * Enable LeftAnti filtered fuzz tests. Add git reference

chore: fix typos of expr, functions, optimizer, physical-expr-common,…

b685e2d

… physical-expr, and physical-plan packages (apache#11538)

rm clone (apache#11532)

4dd8532

Signed-off-by: jayzhan211 <[email protected]>

Minor: avoid a clone in type coercion (apache#11530)

723a595

* Minor: avoid a clone in type coercion * Fix test

Move MAKE_MAP to ExprPlanner (apache#11452)

5f0dfbb

* move make_map to ExprPlanner * add benchmark for make_map * remove todo comment * update lock * refactor plan_make_map * implement make_array_strict for type checking strictly * fix planner provider * roll back to `make_array` * update lock

implement Map literals/nulls conversions in Substrait

5695b0e

Blizzara force-pushed the avo/substrait-map-literals branch from 0d3ca4b to 31d914d Compare July 19, 2024 12:09

github-actions bot added documentation Improvements or additions to documentation sql logical-expr physical-expr optimizer core sqllogictest labels Jul 19, 2024

fix name handling for lists/maps containing structs

688024e

Blizzara added 2 commits July 19, 2024 14:12

add hashing for map scalars

db0ed2c

add a test for creating a map in VALUES

b820936

Blizzara force-pushed the avo/substrait-map-literals branch from 31d914d to b820936 Compare July 19, 2024 12:12

Blizzara closed this Jul 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avo/substrait map literals #3

Avo/substrait map literals #3

Uh oh!

Blizzara commented Jul 18, 2024

Uh oh!

Uh oh!

Avo/substrait map literals #3

Avo/substrait map literals #3

Uh oh!

Conversation

Blizzara commented Jul 18, 2024

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Uh oh!