Skip to content

Avo/substrait map literals #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

Blizzara
Copy link
Owner

Which issue does this PR close?

Related to Map epic apache#11434

Rationale for this change

Substrait didn't support Map literals, since we previously didn't have Map ScalarValues. apache#11224 implemented ScalarValues (thanks!), so now we can add them into Substrait as well.

There were also couple gaps left by the implementation that I encountered while testing this, so I fixed those as well.

Also, there was a bug in the from_substrait_literal for lists containing structs, which I realized while implementing the map support.

What changes are included in this PR?

  • Implement Substrait roundtrip support for Map literals, incl. null and empty maps.
  • Fix field name handling for Substrait lists containing structs
  • Add Map support to ScalarValue::iter_to_array and create_hashes

Are these changes tested?

Tested with new roundtrip tests for the Map literals, and unit tests for hashing.
Also an existing UT for list literals is extended to cover the multiple structs values case.

I'd have added a sql roundtrip test for Map as well, but it seems that the MAP command turns into a ScalarFunction rather than ScalarValue. Maybe we'd need to run some round of optimizer to fold it, but I wonder why that is different from the STRUCT command which does turn into a struct literal?

I.e. doing roundtrip("VALUES (MAP(['k1', 'k2'], [true, CAST(NULL AS BOOLEAN)]))").await?; results in Error: Substrait("Only literal types can be aliased in Virtual Tables, got: ScalarFunction"), while similar code for STRUCT works fine.

Are there any user-facing changes?

jayzhan211 and others added 13 commits July 17, 2024 06:29
* get expr planners when creating new planner

Signed-off-by: jayzhan211 <[email protected]>

* get expr planner when creating planner

Signed-off-by: jayzhan211 <[email protected]>

* no planners in sqltorel

Signed-off-by: jayzhan211 <[email protected]>

* Add docs about SessionContextProvider

* Use Slice rather than Vec to access expr planners

* add test

Signed-off-by: jayzhan211 <[email protected]>

* clippy

Signed-off-by: jayzhan211 <[email protected]>

---------

Signed-off-by: jayzhan211 <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
* Add dialect param to use CHAR instead of TEXT for Utf8 unparsing for MySQL (apache#12)

* Configurable data type instead of flag for Utf8 unparsing

* Fix type in comment
* implement retract_batch for xor accumulator

* add comment
…arquetWriterOptions` (apache#11444)

* refactor: make more explicit the relationship btwn TableParquetOptions vs ParquetOptions vs WriterProperties

* test: demonstrate the relationship btwn session configs and writer props

* refactor: move parquet-format specific functionality to the parquet submodule, leaving only the config options in the config module.

* test: update test fixtures to use the ParquetOptions::default

* test: update test helper session_config_from_writer_props, to not add column configuration when none exists

* test(11367): write test to demonstrate issue 11367

* fix: existing sqllogictests require specific ParquetOptions settings to be left as None

* test(11367): demonstrate how the require bloom filter defaults, (required to avoid test regression), result in different default behavior than parquet crate

* chore: make more reviewable, by pulling tests for issue 11367 into followup PR

* refactor: move all parquet-associated features into parquet-writer mod

* chore: better function naming convention
* feat: add count empty rewrite

* feat: make count support zero args

* docs: add apache license

* tests: make count() valid

* tests: more tests

* refactor: sketch `AggregateFunctionPlanner`

* refactor: cleanup `AggregateFunctionPlanner`

* feat: add back rule

* Revert "feat: add back rule"

This reverts commit 2c4fc0a.

* Revert "refactor: cleanup `AggregateFunctionPlanner`"

This reverts commit 4550dbd.

* Revert "refactor: sketch `AggregateFunctionPlanner`"

This reverts commit 658671e.

* Apply suggestions from code review

Co-authored-by: Andrew Lamb <[email protected]>

* refactor: PR feedback

* style: fix indent

---------

Co-authored-by: Andrew Lamb <[email protected]>
…leParquetOptions` (apache#11524)

* test(11367): define current behavior of parquet writer configuration defaults

* chore(11367): update code comments to make it more explicit on the mismatches
* Create datafusion-physical-optimizer crate

* fmt .md

* Update Cargo.lock in datafusion-cli

* fmt toml and fix doc
…pache#11491)

* Revert "Support `NULL` literals in where clause  (apache#11266)"

This reverts commit fa01917.

* Followup Support NULL literals in where clause

* misc err change

* adopt comparison_coercion

* Fix comments

* Fix comments
…ache#11483)

* Update the parquet code prune_pages_in_one_row_group to use the `StatisticsExtractor`

* fix doc

* Increase evaluation error counter if error determining data page row counts

* Optimize `single_column`
comphead and others added 7 commits July 18, 2024 09:53
* Enable LeftAnti filtered fuzz tests

* Enable LeftAnti filtered fuzz tests. Add git reference
Signed-off-by: jayzhan211 <[email protected]>
* Minor: avoid a clone in type coercion

* Fix test
* Add input_nullable to UDAF args StateField/AccumulatorArgs

This follows how it done for input_type and only provide a single value.
But might need to be changed into a Vec in the future.

This is need when we are moving `arrag_agg` to udaf where one of the
states nullability will depend on the nullability of the input.

* Make ArragAgg (not ordered or distinct) into a UDAF

* Add roundtrip_expr_api test case

* Address PR comments

* Propegate input nullability for aggregates

* Remove from accumulator args

* first draft

Signed-off-by: jayzhan211 <[email protected]>

* cleanup

Signed-off-by: jayzhan211 <[email protected]>

* fix test

Signed-off-by: jayzhan211 <[email protected]>

* distinct

Signed-off-by: jayzhan211 <[email protected]>

* fix

Signed-off-by: jayzhan211 <[email protected]>

* address comment

Signed-off-by: jayzhan211 <[email protected]>

---------

Signed-off-by: jayzhan211 <[email protected]>
Co-authored-by: Emil Ejbyfeldt <[email protected]>
* move make_map to ExprPlanner

* add benchmark for make_map

* remove todo comment

* update lock

* refactor plan_make_map

* implement make_array_strict for type checking strictly

* fix planner provider

* roll back to `make_array`

* update lock
@Blizzara Blizzara force-pushed the avo/substrait-map-literals branch from 31d914d to b820936 Compare July 19, 2024 12:12
@Blizzara Blizzara closed this Jul 19, 2024
Blizzara pushed a commit that referenced this pull request Mar 6, 2025
…4544)

* add mut annotation

* fix rust examples

* fix rust examples

* update

* fix first doctest

* fix first doctest

* fix more doctest

* fix more doctest

* fix more doctest

* adopt rustdoc syntax

* adopt rustdoc syntax

* adopt rustdoc syntax

* fix more doctest

* add missing imports

* final udtf

* reenable

* remove dep

* run prettier

* api-health

* update doc

* update doc

* temp fix

* fix doc

* fix async schema provider

* fix async schema provider

* fix doc

* fix doc

* reorder

* refactor

* s

* finish

* minor update

* add missing docs

* add deps (#3)

* fix doctest

* update doc

* fix doctest

* fix doctest

* tweak showkeys

* fix doctest

* fix doctest

* fix doctest

* fix doctest

* update to use user_doc

* add rustdoc preprocessing

* fix dir

* revert to original doc

* add allocator

* mark type

* update

* fix doctest

* add doctest

* add doctest

* fix doctest

* fix doctest

* fix doctest

* fix doctest

* fix doctest

* fix doctest

* fix doctest

* fix doctest

* fix doctest

* prettier format

* revert change to datafusion-testing

* add apache header

* install cmake in setup-builder for ci workflow dependency

* taplo + fix snmalloc

* Update function docs

* preprocess user-guide

* Render examples as sql

* fix intro

* fix docs via script

---------

Co-authored-by: Andrew Lamb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.